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PREFACE 10 THE SECOND EDITION 

Tie autlcrsa.re thankful to Ike readers fer. the kind reception 
which .tjhty gave. to the sc,cct?d edition. The tcck has teen revised 
and a' few more crumples taken frem recent Indian University 
Papers bate teen added at proper places. We tope, tie present 
edition, is free ficm mistakes srd misprints especially fer fully 
revised chapters so th at the text with this change will be found to 
be more useful to the readers. 

We shall gratefully acknowledge the suggestions fer the 
improvement of the book. 

Baraut Authors. 


PREFACE TO THE FIRST IE1TI0N 

The text is intended painly fer the vse cf students effiting 
'One of the agriculture subjects (like Agrctcny, Agr. Eotsty and 
Agr. Extension) at post graduate level in which Statistics is setas 
a compulsory . subject. Many of these Students lave a limited 
knowledge of mathen aiics. The present beck, we telieve, bas a 
large number of exercises of all types trstly to satisfy the need of 
those fer whom it is intended secondly to illustrate the theory amply. 

The reader is expected to he familiar with tie Aritbiret c of 
school level and a limited krowledge of Algebraic syntols. The 
writers have purposely avciced provrngtle foinulae lest it slculd 
unnecessarily burden the minds of the students. 

It is particularly important that tle«nsder should not fcim 
the impression that Statistical-h etbeds ccntain a series of 
incomprehensible formulae to be applied indiscriminately to any 
available data. However, the care shculd he taktD to grasp and 
appreciate the ideas and principles urdeilying the method learnt. 
The book has been divided into three parts, namely, 

(0 Statistical Methods , 

(it) The Experimental Designs, 
and (///) Official Agricultural Statistics, 

and thus contains all the information and topics reeded for M. Sc. 
Ag. Students. These parts have teen further sub-divifed into 
chapters. A large number of unsolved examples have been added 
at the end of each chapter for practice. Agra Pniversity 
Examination papers are also attached in the end of the book for 
M. Sc. Ag. Students of Agra University. 



Writers of a text book are always indebted to all the earlier 
bo)ks >n the subject and therefore we express our gratitude to all 
the publishers and writers whose books we have often consulted. 

Tne authors are indebted to the staff members Prof. Fauran 
Singh, Prof. H. P. Sinjh and Prof. 0. S. Verm, J. V. College, 
Baraat, for their critical co n neats aid valuable suggestions which 
have uidoubtedly improved the book. In parjicalar, we consider 
it our noble duty to express our deep gratitude to reverend Gurujtt 
prof. SI abend ra Pratap, head of the Statistics Department, J. V. 
Cot fcg e, Baraut who has given his valuable guidance and assistance 
at ail stages. 

We are thankful to Prof. Rajendra Singh, head of the Statistics 
Department, A. S. Jat College, Lakhaoti (Buland Shahar) for his 
assistance in calculations of the problems. 

Inspite of oar best efforts, a nu nber of misprints and mistakes 
are likely to have crept in a book of this type. We shall be grateful 
to any notice of these errors and for any suggestion for improving 
the book. 

Baraat Singh 


Veras 
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Chapter 1 


Classification & Tabulation 


The raw statistical data which is collected in the course of 
an inquiry, usually consists of a series of readings and measure- 
ments. The original form of the collected data is very complex 
& voluminous and so it is very difficult to draw any inference 
from it as such. Thus, it is necessary to condcnce the data and 
present it in a systematic way. It is done by the ’Classification 
& Tabulation” of the data. By these processes, the volume and 
complexity of the data are reduced; which make the analysis 
and interpretations of the data easier. 

Classification: — It is the process of arranging the 
individuals or items into groups or classes according to their 
similarities with respect to some variable character. 

The Advantages of classification:— 

(1) The whole data is divided into a number of classes 
as the items having resemblances with respect to certain 
character are put together in one class. Thus, it indicates the 
point of similarity and dis— similarity. 

(2) The classification reduces the volume of the data 
which helps in forming the mental picture of the data. 

(3) It brings into light those important informations 
which are liable to be ignored without classifying the data? 

(4) It prepares the ground for comparisons and inferen- 
ces. Because the classification of the data is not only sufficient 
for the comparisons and interpretations but it also helps in the 

tabulation of the data. Thereafter, the comparisons andnhc 
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interpretations on the data are possible. 

(5 ) The classification provides the orderly arrangement 
of the items of the data. 

Basis of classification: 

The classification of the data is based upon the characteri- 
stic possessed by the items contained in the data. There are 
two types of characteristics: — 

(i) Descriptive (ii) Numerical. , 

(i) Descriptive-characteristic: Under this type 
of characteristic, the quantitative measurements are not 
possible but only the presence or absence of a particular chara- 
cteristic is observed in the terms available in the data. For 
example, sex, literacy, unemployment and blindness etc arc the 
'Descriptive or qualitative characteristic of the items. 

Classification according to Attribute: The 

classification of the given data according to a descriptive chara- 
cteristic is known as the classification according to attribute. 
Here only the presence or absence of a quality (attribute) forms 
the basis of classification 

The classification can further be divided into two 
kinds; — 

(a) Simple classification or classification by dichotomy, 

(b) Manifold classification. 

(a) Simple Classification: The kind of classification 
is termed as Simple classification where only one attribute is 
taken as the criterion of classification. Here, the whole data is 
divided into two classes: — 

(1) Having those items which posses the attribute 
quality ), 

(2) Having those items which do not possess the attribute. 

«. For example; the population of a country can be divided 

into two classes according to the attribute 'Unemployment’ 
i.e. unemployed and employed or according to the attribute 
'sex’ i ; e. male and female etc. 
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(b) Manifold Classification . The kind of classifi- 
cation is germed as Miuifold classification where more*than 
one attribute is taken as the criteria of classification. Here the 
whole data is divided into more than two classes. For example, 
the population of a country can be divided into four classes 
according to two attributes ‘sex’ & ‘literacy’ as shown in the 
diagram given below. 


i 

literate 


Population 

4 


i (Sex) 
Male 

I 

(literacy) j | 

illiterate literate 


I 

Female 

I 

(literacy) 


illiterate 


Thus the sub-divisions of the populat'on arc— 

(1) literate male (2) Illiterate male (3) literate female 
(4) Illiterate female 

Note:- 

In classification according to attribute, it is necessary 
that the classes should be clearly defined before starting the 
actual work of classification i. e. a boundary should be set 
between the classes. .4 .s' for example, in *ase of literacy we 
must decide in advance to whom we shall call the literate 
and to whom the illiterate before starting the work of classi- 
fication of the data. 

(ii) Numerical Characteristic : Under this type of 
characteristic, the quantitative measurements are possible. As 
prices, income, age, yield, height, and weight etc. are the 
examples of numerical characteristic. 

Classification according to Clals-lnterval • The 

classification of the given data according to a numerical chara- 
cteristic is known as the classification according to the class- 
interval. 
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In the classification according to class-interval, we must 
be Ascertained before hand whether the data is discontinuous or 
a continuous one. The discrete variables are easier to deal with. 
An example of a discrete variable is the throws of a die. 


Example! 1): Let a die be thrown 50-times, and noted 
the number at its upper most face each time. If the number 
obtained are 1, 6, 5, 5, 4, 3, I, 2, 3, 4, 2, 5, 6,' 1, 2, 2, 2, 
5, 1, 6, 6, 4, 6, 4, 3, 1, 6, 3, 1, 4, 3, 5, 6, l, 3, 6, 4, 1, 1, 4, 
3, 5, 3, 1, 2, 4, 4, 4, 2, 5, The variable (X) is the outcome 
of the number at the upper most face of the die in each throw 
and it clearly assumes the values 1,2, 3, 4, 5, & 6. 


In order to classify the data, these six values assumed by 
the variable (X) will be written in a horizontal row or a 
vertical column under the heading “'Variable Value (X)” . 
Since each value of X is repeated a number of times; the num- 
ber of repeatition of a variable value is called its frequency (f). 
In classification, these respective frequencies for each value of 
the variable are written against the values of the variable in 
the 2nd. column under the heading “ Frequency f As in the 
present example, thp variable value *1* is repeated 10 times 
and variable value ‘2’ is repeated 7 times; so the respective 
frequencies for X=1 & 2 are 10 & 7, 


Frequency Table • The table, showing the different 
values assumed by the variable and their respective frequencies 
is called a frequency table. 


The frequency table for this given example is shown 


below:- 
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Table No. 1 


Variable value 

(X) 

Frequency 

(0 

1 

10 

2 

7 

3 

8 

4 

10 

5 

7 

6 

8 

Total 

50 


This method is adopted only, when the values taken by 
the variate are not so many (numerous). 

On the other hand, when the number of values taken 
by the variate is large and the range between the greatest and 
the smallest value is also large, then the entire range is 
partitioned into classes of appropriate size by drawing the 
arbitrary lines showing the corresponding frequencies of the 
variate in each class. Each class is defined by two boundaries, 
the lower boundary and the upper boundary. The lower 
boundary of the class is called the Lower Limit of the class 
and the upper boundary is called the Upper Limit of the 
class. 

Class-Interval: It is the difference between the upper & 
the lower limit of the class. For example, fdr the class 25 — 35 , 
the upper limit is 35 and lower limit is 25 and the C. I. =35— 
25-10. 

Variate Value : The mid-value of the class is called the 
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variate value U. M.V.= "PP er Limil + Wcr limit ^ so jn 

the above mentipned class the M.V. = 35-f-25 — qq 

2 

Important points for frequency distribution : 

While making a frequency distribution, the following points 
should be borne in mind— 

(1) The class-intervals should be equal in pach class to 
provide with easy computations. 

(2) The number of classes should not ordinarily exceed 
20 and it should not be less than 6. Because fo r more than 20 
classes, the calculations become heavy and there is not much 
gain in the accuracy. Also, for less than 0 classes, there may be 
a great deal of loss in accuracy. 

(3 ) In general, there should be no class without definite 
limits as under ‘a’ or over ‘b’. 

(4) We must treat all the values assigned to the different 
classes equal to the mid-value of the classes. Hence the class- 
interval must be chosen in such a way that the average of all 
the items in that clacs should not deviate much from the mid 
value of the class. 

(5) A decision should be made before hand whether the 
class'a-b’contains a lower limit of the class or b, upper limit of 
the class. A note of this should also be given, below the table. 

(6) The class-interval should be an’ integer as far as 
possible. 

Methods of Classification according to Class-interval:- 

There are two methods of classification according to the 
class-interval. 

(1) Exclusive Method (2) Inclusive Method 

(1) Exclusive Method: — In this type of classification, 
the upper limit of any class is the lower limit of the 
succeeding class i.e. the class-limits are of the type a-b, b-c, 
c-cf. etc. Hence, there, is a confusion about those items 
which are exactly equal to the class-limits (called border 
line items ). Usually, the border line items are placed 
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in the classes where they are as the lower limits of the chiss.As 
variable value ‘b’ is placed in the class b-c. This confusion can 
be removed by having the classes of the type, a and under b, 
b and undo: c and so on. 

(2) Iuclusive Method: — In this type of classification 
there is a gap between the upper limit of the class and the 
lower limit of the succeeding class. Hence, it leaves r.o room for 
confusion for the border line items. Thus the classification will 
be of the form a-b, c-d, e-f etc. and the class a-b will involve 
all the items from a to b inclusive a & b within it. This 
method can not be used in the case of a continuous variate. 
Rules for making a frequency distribution: 

(1) Before starting the actual work of classification, a 
preliminary inspection of the data should be made and the 
difference of the highest & the lowest values should be divided 
by the desired number of the class to be formed. It will give 
an approximate idea of the class-interval and will also help in 
setting the class limit . 

(2) After deciding about the class-interval and class-limits, 
a table containing three headings namely? class-interval, tally 
marks & frequency should be prepared. 

(3) Read for the items in the row table and for each 
items put the mark ‘1’ in its corresponding class in the column 
headed Tally marks. After putting 4 marks of the above type 
the fifth is obtained by crossing the 4 marks and thus forming 
a group of 5 which helps in counting in tbe^encl. 

(4) The sum of the tally marks is written in the column 
of frequency against the respective classes. 

(5) The check of the sum of all the frequencies(sf) must 
show Sf equal to the total number of variate-values. 
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The marks obtained by 60 students in the paper of 
statistics are given below — 

22 47 £ 42 31 17 13 15 18 25 , 35 33 

38 15 0 33 10 34 29 26 16 33 16 27 

24 22 26 19 14 36 18 25 21 12 21 10 

28 25 17 38 10 3 31 24 3 2 36 18 

26 29 27 39 28 35 26 27 13 33 25 18 


Sol: — 


Arrange the data in the form of a frequency distribution? 


Here the variate is the numbers obtained in statistics and the 
original form of the data is an ungrouped data. If we arrange 
the data in the form of a table shown in the example (1), then 
there are as many as 29* values of the variable. Some of them 
occuring once, some twice, some thrice and some 4 times 
at the most. 


Though, it is a frequency distribution giving a clear idea 
when compared to an ungrouped data, yet it may be improved 
by classifying it into groups. From an eye inspection of the 
data it is obvious that the range is 47-0=47, because the high* 
est value is 47 and the lowest value is (). Further, the variable 
does not assume a fractional value and is a discrete variate 
having gaps between its successive values. The lowest value 0 
suggests that we must start with lower limit as Zero and cont- 
inue by 4 marks interval, adopting the Inclusive-Method of 
classification. 

The frequency-table will be formed as given below — 
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Table No. 2 


Class-interval 

Tally marks 

Frequency (f ) * 

0—4 

INI 

4 

5—9 

| 



1 

1 

10-14 

WJ II 

7 

15—19 

mi mj i 

11 

20-24 

w j i 

6 

25—29 

wj mi rw i 

16 

30—34 

WJ 11 

7 

35-39 

WJ 1 

6 

40—44 

i 

1 

45—49 

i 

1 

Totals 


Sf-60 





In a similar manner, we can form the frequency table 
(distribution) when the ungrouped data is such that it pertains 
to a 'continuous variate and there Exclusive Method of 
classification will be used. 

Example (3) 

The following table gives the Vickers Hardness number 
of 20 shell eases— 


60.3 

61-3 

62-7 

60-4 

602 

64.5 

66-5 

629 

61*5 

67*8 

65.0 

62*7 

62*2 

84/8 

63*8 

62-2 

67*5 

67*5 

60*9 

63*8. Arrange the data in 


the form df a frequency distribution'? 
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Solution: — 

The largest value is 67’8 and the lowest is 60*2, so the range 
is 67*8 — 60*2=7*6; it suggests us to start with 60*0 as our 
lower limit and to continue by 1 (unit) class interval. The 
variable is the Hardness Number and clearly it is a continuous 
variate. 

The frequency- table is given below — 

Table No 3 


£. 


Class-interval 

Tally marks 

Frequency (f ) 

60—61 

m 

3 

61—62 

H 

2 

62—63 

nil 

5 

63- 64 

64- 65 

i 

1 

ii 

2 

65—66 

ii 

2 

66-67 

ii 

2 

67-68 

in 

3 

Totals 


20=Sf 


Cumulative Frequency Table : In some of the 
statistical investigations (Educational tests, wages or salary 
statistics etc.), we^require the number of variates which are 
'fegs than' or ‘ more than ’ a given value. For this purpose, it is 
necessary to change an ordinary frequency table into a. 
c um ulative frequency table. It can be done in the following two 
ways: — 
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(1) Less than type: 

Suppose, we are given the daily expenses for one week 
as follows:— 

Table No. 4 


Days 

Expenses 

(M 

Monday 

4 

Tuesday 

6 

Wednesday 

10 

Thursday 

i 16 

Friday 

12 

Saturday 

8 

Sunday 

4 1 


Now if we want to know the total expenses upto 
Wednesday then it will be given by the s*im of the expenses 
for Monday, Tuesday & Wednesday i.e. 4+6+10=20. 
Similarly the total expenses incurred upto any day of the week 
can be obtained by adding the expenses from Monday upto 
that day. In the tabular form, the total expenses can be 
represented as given below in table No. 6. 
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Table No. 5 


Days 

Expenses up to the day 
(Rs.) 

Monday 

4 

Tuesday 

10 

Wednesday 

20 

Thursday 

36 

Friday 

48 

Saturday 

66 

Sunday 

60 


On the other hand, the expenses after any day (includ- 
ing the day) will be obtained by adding the expenses for that 
day and after-wards. The following table No. 6 serves this 
purpose. 

Table No. 6 


Days 

Expenses after the day 

(M 

Monday 

60 

Tuesday 

66 

Wednesday 

50 

Thursday 

40 

Friday 

24 

Saturday 

12 

Sunday 

4 
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It the days are replaced by the variate x, by its values 

x 1( x a x ? . then the values in the 2nd. column of table 

No. 5 show the number ‘less than’ the variate value and 
column 2nd. of the table No. 6 shows the number ‘ more than' 
the variate value. 

These frequencies, which arc less than or more than the 
variate value (as shown in second column of table No. 5 & 6) 
are called the 'Cumulative Frequencies. 

The similar procedure is adopted when instead of the 
variate value, the classes arc given. 

Example No. 4 

Form a cumulative frequency table from the following 
data: — 

class: 0-3 3-6 6-10 10-12 12-15 15-19 19-20 20-23 23-25 
Frequency:2 4 5 7 8 11 12 14 10 

Solution: — 

(i) In making the cumulative frequency table of “Less 
than type”, the first class will be replaced by under 3, second 
class by under 6 and so on. The cumulative frequencies will 
be the sums of the frequencies upto that class for all the 
respective classes. The cumulative frequency tabale of “ Less 
than type ” is given below: — 

Table No. 7 


Variate value 

Cumulative frequency 
(c. f.) 

Under 3 

2 

Under 6 

6 

Under 10 

11 

Under 12 

18 

Under 15 

26 , 

Under 19 

37 

Under 20 

49 

Under 

63 

Under 25 

73 
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( 2 ) More than type: 

(ii) In foming the cumulative frequency table of “ more 
than type ”, the first class is replaced by more than 0, second 
by more than 3 and so on. The cumulative frequencies are the 
sums of frequencies onwards this class for all the respective 
classes. The cumulative frequency table of “more than type 
is given below: — 


Table No. 8 


Variate Value 

Cumulative frequency 
(c. f.) 

More than 0 

73 

More than 3 

71 

More than 6 

67 

More than 10 

62 

More than 12 

55 

More than 15 

47 • 

More than 19 

36 

More than 20 

24 

More than 23 

10 


Note: (1)- 

From the above example , it is clear that in the " Less 
than type ” c. freq . table, we have to replace the classes 
by ' under ' or 'below' the ‘ upper limits of the classes' and 
in the “ More than type" c. freq. table we have to replace 
the flosses by 'More than or over * the lower limits of the 
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c lasses'. 

Forming an ordinary freq. table from a c. freq 
table:— 

There are the cases when we want to change the cumulative 
frequency table into an ordinary frequency t table. For example, 
if for every, unit of consumptioh of a certain commodity, its 
total utility is given and from it we want to know the marginal 
utility for every unit of consumption then it will be a case 
of forming an ordinary freq. table from a cumulative one. 
Example (5) 

In the table given below, the total utility for a unit 
consumption of a certain commodity is given. Find the 
marginal utility for every unit of consumption ? 

Unit : 1 2 3 4 5 6 7 8 9 10 11 12 

Total utility : 3 11 21 33 49 63 73 81 98 103 107 108 

Solution: — 

Wc know that: — 

the total utility for ith unit — the marginal utility for 1st unit 

+ -.. +the marginal utility for i ,b unit. 

From this, it is clear that the relation between the total 
utility and the marginal utility is the same as between the 
cumulative frequency and the ordinary frequencies. 

Here we have to convert a cumulative frequency table 
into an ordinary frequency table as shown below: — 

Table No. 9 


Unit 

Marginal 

utility 

Unit 

Marginal 

utility 

i 

3 

7 

10 

2 

8 

8 

8 

3 

10 

9 

17 

4 

12 

10 

5 




« 

5 

16 

11 

4 

6 

14 

12 

t 

1 
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Note: (2) — 

To form the ordinary frequency table from cumulative 
frequency table of “ less than type ” we have — 

frequency for i lh classic, f for i tb . class— c. f. for 
( i — l) ih class. 

Similarly, to convert a c. f. table of t( more than type ’ 
into an ordinary ferquency table we have 

frequency for V b elass=c. f. for (i) ib class c. f. 
for (t-f l) tb class. 

Example (6) 

From the following data, find the number of persons in 
the clas cs 20—25, 25—30 and 55 -60. 

Age more than : 20 25 30 35 40 46 50 55 

No. of persons : 800 750 680 580 400 250 130 60 

Solution:— 

We have been given that the number of persons who 
are old than 20 is 800 and those who are old than 25 is 750. 
Thus the no. of persons who are between the age of 20 & 25 
is 800-750 = 50 only. In the same way, the no. of persons 

between 25-30, 30-35 and 56-60 can be found. 

Because the formula used is:— 

frequency for i th class =c. f. Tor i 11 * class — c. f. for 
(i+l) tt class. 

O 

Thus the table on the next page shows the no. of person: 
between the different age^groups. 
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Table No. 10 


Age-group 

No. of persons 

20—25 

50 

25-30 

70 

30-35 

100 

35—40 

180 

40—45 

150 

45—50 

120 

50 -55 

70 

55-60 

60 

Total j 

800 


Example (7) 

Obtain the ordinary frequency table from each of the 
following tables: — 

(1) • * (2) 

No. of Students No. of Students 


Marks Below 10 

3 

Marks above 0 

30 

» 20 

.8 

„ „ 10 

26 

»> u 3o 

17 

„ 20 

21 

40 

20 

„ „ 30 

14 

„ „ 50 

22 

„ » 40 

10 

Solution: — 

(1) Here the c 

. f. table 

„ „ 50 

of ‘Less than type’ is 

0 

given and 


we have to convert it into an ordinary frequency table. The 
frequency for i tb class =c. f. for i th class — c. f. for (i-l) th 
class. Thus, the table given below shows# the desired no. of 


students in the different groups of marks: — 
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Table No. 11 


rks 

No. of students 

0—10 

3 

10-20 

5 

20—30 

9 

30—40 

3 

40—50 

2 

Total 

22 


(2) Here the c. f table of “More than type” is given 
and we have to convert it into an ordinary frequency table. 
The frequency for i th class=c. f for i th ciass — c. f. for (i + 1)" 1 
class. Thus, the table given below shows the desired no. of 
students in the different groups of marks: — 

Table No. 12 


! • 

Marks 

No. of students 

0-10 

4 

10—^0 

5 

20-30 

7 

30—40 

4 

40—50 

• 

10 

Total 

30 
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Tabulation: — . 

It has been mentioned already that the classification 
alone is not sufficient for comparisons, and inferences. The data 
is further put to the treatment of tabulation which make it 
fit for further analysis and comparisons. 

Thus the Tabulation is the process of arranging the 
clarified data in an orderly manner into rows & columns in 
such a way that it makes clear all the important informations 
contained in the data. 

for tabulation, the following points should be kept in 
mind — 

(1) Every tible should have a title, explanatory in itself 
and it should provide the informations regarding: — 

(а) What the data are. (b) Where the data are. 

(c) Principle of clasification. (d) Time of data. 

(2) The rows & columns must be arranged in a logical 
ordi r to facilitate the comparisons. 

The headings & sub-heading should be concise and 
wi bout any ambiguity. 

(4) The units of the data presented, should be clear. 

(5) The complicated tables should be avoided and so 
they must be broken up into parts, in more than one simple 
tables. 

(б) The size of the table shonld*suit the size of the paper. 

(7) The table should be presented in a neat, clean and 
comprehensive way by drawing the double lines wherever 
required 

(8) The source of the table should also be written. 

Types of Tabulation 

There arc two types of tabulation: — 

(l) Simple tabulation (2) Complex tabulation. 

Simple Tabulation:— 

In a simple tabulation, the data is presented with** respect 
to one character only or in other words* a simple table is 
capable to provide the information relating to one character 
only. For example, the population of India can be divided 
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according to religion. The table for this purpose will be as 
follows:— 


Table No 13 


Religion 

Population 

1. Hindu 

• • • 

2. Muslim 


3. Sikh 

• « • 

4, Christian 


5. Jain 

••• 

6. Others 

... 

Totat 

• • • 


(2) Complex Tabulation?— 

A complex fable gives the information relating to 
several characters. This type of tabulation is further classified 
as ‘Double Tabulation’ ‘Triple or Treble Tabulation’ and 
‘Manifold Tabulation’. 

Double or two fold Tabutdfitfn: 

In this type of tabulation,' the data is divided according 
to two characteristics. F6r example, the -population of India 
canbe divided according to religion^f$rtai ; tCrSfIdj^|<l table for 
it is given* betioWr — 
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Table No. 14 


States 

Population 

! 

Total 

Hindu 

Muslim 

1 i 

Sikh 

i . . 1 

Christian 

Jain 

others 

1. Assam 








2. Bihar 








3. Punjab 








4. U. P. 








Total 




• 





Treble Tabulation : 

In tliis type of tabulation, the data is sub-divided accor- 
ding to three characteristics. Treble tabulation is capable of 
answering three mutually dependent qnestions. The table given 
below is an example of treble tabulation. It shows the distribu- 
tion of India’s population according to Sta.es, religion and 


sex. 
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Table No. 15 



In this type of tabulation, the data is divided with respect 
to more than three characteristics. From this type of table, we 
can get the information relating to several (more than three) 
characteristics. An example of this type of tabulation is the 
division of India’s population according to States, religion, 
sex, literacy and age etc. 

Example (8; 

Draw up in detail, with proper attention to spacing, 
double lines etc. and showing all the sub-totals, a blank table, 
jn which could be entered the numbers occupied in the six 
industries at two dates ( July 63, July 64) distinguishing males 
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from females and arrong the latter, single, married and vidowed. 

Solution: — 

The desired table is given below: — 

Table No. 16 


No. in July, 63 I No. in July, 64 I Totals 



The above table shows the no. of employees in six indu- 
stries according to their distribution into sex and two dates. 

Example (9) • 

Draw up two independent blank tables, giving rows, 

columns, and totals in each case, summarizing the details Jbout 
the number of families, distinguishing males from females, 
earners from dependents and adults from the children. 
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Solution:— 

The desired table showing all the required informations 
will be a complex table and it will be tedious to understand it. 
For the sake of convenience we partition this table into two 
simple tables: — 

(i) Giving the distribution of all the male members in 
the families according to age and earning; 

(ii) giving the distribution of all the female members in 
the families according to age and earning. 

(i) Table — Showing the distribution of male members 
according to age and earning — 


Table No. 17 (i) 


Family 

Earner j 

Dependent 

Totals 

Adults 

1 O 

s 

3 

Total 

Adults 

children 

Total 

Adults 

children 

Total 

1 . 

2. 

3. 

4. 

5. 

6 . 










c 

Totais 

' 










Chapter II 


Graphs And Diagrams 


After condensing and summarizing the complex and 
numerous data in a systematic manner by means of classification 
and tabulation, we require certain divices which may present 
the condensed form of the data. This condensed form of 
the data is such that it becomes at once comparable and 
leaves an everlasting impression on the brain of the observer. 
One of the methods making the data intelligible is to 
represent it by means of graphs and diagrams. The graphic 
& diagrammatic representation of the data is always appealing 
to the eye as well as to the mind of the observer. 

Advantages of Graphs & Diagrams: — 

^1) The graphs & diagrams represent the data in 
attractive and appealing way both to the eye and mind. 

(2) The graphic & diagrammatic representa'ion of the 
data leaves an everlasting effect on the brain. 

(3) The diagrams are not only attractive and impressive 
but they save time also. Because through the diagrams, it is 
possible to have an immediate grasp of significance. 

(4) Another merit of the diagrams is the ease with 
which the two sets of the data are compared with each other. 

(5) Forecasting becomes easier with the help of the 
graphs. 

(6) The graphs are helpful in the interpolation also and 
they give an indication of correlation between the two variables. 

(7) The partitioning values (median & quartiles etc.) 
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and mode are also determined by graphs. 

Limitations of graphs & diagrams:— 

( 1) The graphs and diagrams provide an approximate 
picture of the data and not the accurate one. Thus they are 
useful in representation to general public but not to a 
statistician or an investigator who is interested in the 
detailed study. 

(2) They are used to compare only such data which arc 
technically comparable. 

(3) They are easily capable of misuse. 

Graphical Representation of Frequency Distributions: 

In the graphical representation of the frequency distribu- 
tions the horizontal axis of the graph >s used to show the 
variate values and the vertical axis for the frequencies of the 
variate. The graphs of the frequency distributions are of the 
following types — 

(1) Histogram (2) Frequency-polygon and Frequency 
Curve \S) Cumulative frequency cuive or ogive. 

(1) Histogram : It is composed of a set of rectangles 
one over each class-interval on the X-axis. The width of the 
rectangle is taken proportional to the class-interval and its 
height is taken proportional to the frequencies in the case of 
equal intervals. But in the case of unequal class-intervals the 
height of a rectangle is taken proportional to the ratio of 
frequencies to the class-interval and thus the area of the rectangle 
is proportional to the frequencies of the variate or the classes. 
Example (1) : — 

The following is the frequency distribution of the yield 
of sugar-cane in tons per acre. 

Class : 35-40 40-45 45-50 50-55 55-60 60-65 65-70 

Frequency : 7 8 12 26 32 15 9 

Drawn a histogram representing the above distribution ? 

Solution:— 

In this example, the class intervals are . equal and so the 
heights of the rectangles will be proportional to the frequencies. 
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Graph No. 1 

HISTOGRAM 

"representing the yield of sugarcane in tons/acre” 



Example No. (2): — 

Draw the histogram of the following distribution: — 
Heights : 48-50 50-54 54-55 56-58 58-59 59-60 60-63 
(in inches) 

No. of Boys : 

(frequency) : 12 60 20 16 *5 2 3 

Solution : — 

In this example, the class-intervals are not equal, so the 
heights of the rectangles will be proportional to the ratio of 
the frequencies to their respective class-intervals. Thus* the 


V'S*->1 Small div. = l boy 
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heights of the rectangles are given in the following table 
against the respective classes: — 

Table No. 18 


Classes 

Frequency/Class Interval 

.48-50 

12/2= 6 

50-54 

60/4=15 

54-56 

20 2=10 

56—58 

16/2= 8 

58- 59 

5/1= 5 

59-60 

2/1= 2 

60-63 

3/3= 1 


Graphs No. 2 


HISTOGRAM 

"representing the heights of the boys’* 




VS. ►! Small div. = l unit of frequency 
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Frequency Polygon & Frequency Curve: —The 
frequency polygon is constructed by joining the points by 
means of straight lines whose abscissae are*thc mid points of the 
classes and the ordinates are the corresponding frequencies. A 
frequency polygon can also be obtained by joining the mid- 
points of the upper sides of a histogram by straight lines. 

A frequency curve is a smooth, free hand curve, drawn 
through all the points which arc obtained for the frequency 
polygon. The area under the curve should be equil to tint of 
the histogram. Frequency polygon is used to find the ‘mode 1 
which is the apex of the curve. 

Example (3a)— 

Draw a frequency ploygon & a frequency curve for the 
data in example (1) on page 28. 

Solution: — 

In this example, the points of the classes are plotted on 
the X-axis and the frequencies on the Y-axis. 

Graph No. 3 (a) 

Representing the yield of sugar can? in tons/acre. 
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Exscmplc 3 (b): — 

The following* table gives the distribution of height in 
inches for 100 students. 

Interval Frequency 


>57 

upto 60 

3 

>60 

„ 63 

12 

>63 

„ 66 

31 

>66 

„ 69 

37 

>69 

72 

16 

>72 

75 

1 

Total 


100 


Represent the data in the form of a histogram as well 
as a frequency polygon ? 

Solution - .— 

See the graph No. 3 (b) 

Graph No. 3 (b) 

HISTOaitAM cb F. POLYGON. 

Representing the distnbut on of height for 100 students 
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(3) Cumulative frequency curve or ogive : The 

ogive is constructed from a less- than type cumulative frequency 
table. The upper limits of the classes are taken as abscissae 
and the corresponding cumulative frequencies as ordinates and 
thus all the points are plotted on the graph. The free hand 
smooth curve through all these points is called the ogive. 
Ogive is helpful in determining the partitioning values 
(median, quartilcs etc.). 

Example (4): — 

Draw a cumulative frequency curve for the data given 
in example (1) on page 28. 

Solution: — 

First of all, we shall construct the cumulative frequency 
table from the given data as follow: — 


Table No. 19 


Vaiiatc Value 
Under 

Cumulative 

frequency(c.f) 

Under 40 

7 

Under 45 

15 * 

Under 50 

27 

Under 55 

53 

Under 60 

85 

Under 65 

100 

Under 7$ J 

109 
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The cumulative frequency curve is shown below: - 


32 
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Graphs No- 4 

Representing the yield of sugar-cane in tnos/rcre 



Gap between the upper & the lower limits of the 
adjacent classes:— 

In the^case of the grouped data of the type a-b, b-c, 
c-d, and so on r we have seen that the graphs of the above 
three types can easily be used. But in case, where there is a 
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gap between the upper and the lower limits of the adjacent 
classes, the class limits should be so modified that the gap 
vanishes and the upper & lower limits of the adjacent classes 
become the same. 

Example (5):— 

Draw the histogram, frequency Polygon and the ogive 
for the following data: — 

Scores : 20-29 30-39 40-49 50-59 (30-69 70-79 

No of students : 2 • 14 22 20 14 3 

Solution:— 

In this example, there is a gap between the upper & the 
lower limits of the adjacent classes. Hence, first we modify the 
classes to have the continuous from of the data and thus finish 
the gap. For the construction of histogram, frequency polygon 
& the cumulative frequency curve for this data, we form the 
following tabic: — 

Table No. 20 


Classes 

Frequency 

(f) 

• • 

Comulativc 
frequency (c.f) 

19*5—29-5 

2 

2# 

29*5- 39 5 

14 

16 

39*5 - 49*5 

22 

38 

49*5 - 59*5 

20 

58 

59*5-09*5 

14 

72 

69*5 —79 5 

3 

75 * 



% 
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Graph No. 5 (a) 

HISTOGRAM & F. POLYGON 

“representing the scores for Students” 



(*5,o) 3*5 4*5 5*5 WS 

H S. >1 Small div. =2 units of score* 
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Graph No: 5 (b) 

Cumulative-Frequency-Curve 

Or 

OGIVE 
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Graph For Discrete Variable: — When the data is 

given for a discontinuous variable accompanied by its 
corresponding frequencies, then the graphical representation of 
such a data is made by drawing thick lines parallel to the 
Y axis. These lines are drawn at the points of the discrete 
values of the variable shown on X-axis and their heights are 
proportional to ihcir corresponding frequencies, f 
Example No. (6) 


Represent 
graphically: — 

the 

following 

frequency- distribution 

X : 1 

2 

3 

4 

5 6 

f : 20 

Solution:— 

15 

17 

19 

13 16 


In the graph, the variate values are shown on the X axis 
and their corresponding frequencies on the Y-axis. 


Graph No. 6 

"Graph for discrete variable” 


* 

a 

« 
s 
c t 
0 
u 


§ 


I 


cn 


f 

in 

> 



H.S »5 Small diva. = 1 unit of variate value 
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Graph for Time Series : ( Historigram ) The 
Historigram is the graph obtained from the time series dara. 
In these graphs, the time variable ‘t’ is measured along X- axis 
and its corresponding values *u t ' along Y-axis. All the 
available points form the data are plotted on the graph and 
joined by means of straight lines. 

Following few examples will illustrate the type of 
graph: — 

Example No. (7): 

The following tabic provides the yearly figures of 
production of sugar in lakh tons — 


Year 

production of sugar 
(lakh tons) 

t 

>‘ 

1948 

9-65 

1949 

8-90 

1950 

075 

1951 

1050 

1052 

11-25 

1953 

10-70’ 

1954 

12-35 

1955 

12-55 

1950 

12-95 

1957 

1300 

1958 

1350 


( i ) Draw the graph to show the fluctuations of the 
production of sugar. 

(ii) Comment on the fluctuations of the production. 

Solution- — 

(i) The production of the sugar in lakh tons is taken 
along Y-axis and years on the X-axis in the following graph. 



Representing the production of sugar in lakh tons, from 1848 
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Example No (8):~ 

Represent the figures given below, on a graph-paper and 
comment on trend shown by the data. 


Year 

Price (or Arhar 
in Rs./maund. 

1929 

4*0 

1930 

4*6 

1131 

3-6 

1932 

3*6 

1933 

3-3 

1934 

33 

1935 

4-7 

1936 

3*4 

1937 

43 

1938 

4*3 

1939 

4-2 

1940 

39 

Solution : 



An eye inspection of the graph shows that the prices 
are neither decreasing nor increasing but they are fluctuating 
around the price Rs. 3.9/maund, the mean value. 
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Example No. (9):— 

Draw the graph from the following table, in which the 
growth of the three kinds of wheat may he read from the 
graph. Discuss the growth of the three kinds of wheat ? 

(U. P. Board 1956) 


\ A ge 
\ Days 
\ 

Kinds \ 

\ 

\ 

Height in centimeters 

15 

30 

45 

60 

| 


| 

120 

Pbsoi 

3-0 

35 


19-0 




106-5 

Cl3 

2-5 

40 


310 

44-0 

82-0 

1115 

119-5 

Local 

3*0 

4*5 

12*5 

22-0 

37.5 

•7 7' 5 

1090 

112-5 


Solution: — 

Of all the three varieties of wheat, the growth in C w 
is minimum upto the age of 45 days in comparison to the 
local variety and after this, the growth of C J3 is the highest, 
while that of Pb S9 i is minimum. The growth of the local 
variety is between the two. 
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Graph \ (9) 

Representing the g wth of three kinds of wheat 



H.S >1 Small div.=3 days. 


An inspection of the graph clearly shows that upto the 
["age of 45 days, the growth in local variety is the highest and 
after it that of C 13 is the highest. Thus, as regards the 
character of growth, C 13 is the best. 
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Example No. (10):— 

The average yields in maunds per acre of rice and wheat 
crops in a state of India since 1950-51 # arc given below. 
Comment whether the yields are increasing or decreasing ? 


Year 

Rice 

Wheat 

1950-51 

5-27 

8*88 

1951-52 

4*38 

8-25 

1952—50 

531 

9-21* 

1953—54 

6-46 

915 

1954—55 

612 

9*64 

1955—56 

7-18 

8-26 

1956—57 

621 

8-46 

1957 58 

619 

7-89 

1958-59 

713 

8-61 

1959 -60 

6- 15 

9*21 

1960-61 

715 

10*91 


(M.Sc. Ag. 1962) 

Solution:— 

Let us first plot the yields of rice and wheat since 
1950 — 51 to 1960 — 61 on the graph-paper. As a result of 
this, we obtain the following graphs. 

An eye-inspection of the above graphs tell us that — 
(i) the yield of wheat has an increasing tendency. The yield 
from 1950 — 51 to 1954—55 is increasing except a slight 
downward fluctuation for the year 1951 — 52. Further the 
yield is decreasing upto 1957 — 58 and again increasing for 
the remaining years. 

(ii) The yield of rice is increasing from 1950— f'l to 
1955 — 56 except a slight downward fluctuation for the year 
1954 — 55. For the next two years, the yields are decreasing 
and are the same. For the year 1958 — 59, thft yield increases 
and again decreases for the year 1959 — 60 ant^ increases for 
19OO_01, Thus on the whole, we can say that the yield 
shows an increasing trend. 



Graph No (10) 

Representing the yield of Rice & wheat from (1950-51) to (60-6 1) 
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Example No. (11): — 

The following table provides the yearly figures of 


production in 
production in 

lakh tons in U. 
U. P. 

P. Comment 

on the wheat 

Year 

Yield 

Year 

Yield 

1943—44 

19.01 

1952-53 

28.24 

1944—45 

23-56 

1953-54 

.3106 

1945—46 

22-83 

1954—55 

32-84 

1946-47 

23-37 

1955—56 

30.41 

1947—48 

21*94 

1956-57 

3M5 

1948—49 

19-74 

1957—58 

27-06 

1949 -50 

24-26 

1958-59 

30-36 

1950-51 

26-78 

1959-60 

32-42 

1951—52 

25-33 

1960-61 

38-82 


(M.sc. Ag. 1963) 

Solution: 

The yields of wheat in U. P. since 1943 — 44 to 1960 — 
61 are plotted on the graph-paper as shown below. 

The graph shows that the yields a-rc • increasing upto 
1945 — 46 and further decreasing for the next two years. 
Again from 1948 — 49 to 1954—55 the yields are increasing 
except a downward fluctuation for the year 1951 —52. After 
this, the yield neither shows an increasing ^ior a decreasing 
tendency, it merely fluctuates around the yield 30 lakh tons. 
But for the year 1960 — 61, the yield increases. Thus on the 
whole, there is an increasing trend in the production of wheat. 
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Diagrams 

The following are the important diagrams — 

(1) One dimensional diagrams, 

(2) Two dimensional diagrams, 

(3) Three dimensional diagrams, 

(4) Cartograms and 

(5) Pictograms. 

(1) One Dimensional Diagrams: — The one dimensional 
diagrams are the lines or bars arranged in ascending or descend- 
ing order of magnitudes (length) along a vertical or a 
horizontal scale. The lengths of the bais are-proportional to the 
magnitudes of the items. 

(2) Two D im ensional Diagrams: — In the two dimen- 
sional diagrams, the magnitudes of the items arc represented by 
the areas as by squares, rectangles and circles etc 

(3) Three Dimensional Diagrams:— In the three 
dimensional diagrams, the data are represented by the cubes or 
cylinders and the magnitudes of the items are represented by 
the volumes of the cuboids or cylinders. 

(4) Cartograms:— Here the data is •presented l\ means 
of maps. 

(5) Pictograms:— Here the data is presented by means of 
pictures. 

But the bars and circular diagrams ar^ most commonly 
used because of their accuracy and easiness in sketching. 

“One Dimensional Diagrams” 

The bar diagrams of common use in the one dimensional 
diagrams a/e of the following main types:— 

(1) Simple bar diagrams, 

(2) Multiple bar diagrams, 

(3) Sub-divided bar diagrams. 

(1) Simple bar diagrams:— Here the magnitudes of 
the items are represented by thick bars of unifprm width With 
equal spacing between any two consecutive bars. The lengths 
of the bars are proportional to the magnitudes of the items. 
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One bar is drawn for each item and they are arranged in the 
ascending or descending order of lengths along a vertical or 
horizontal line. These simple bar diagrams are appropriate for 
the data contained in simple tables. 

Example (12) 

Draw a simple diagram to represent the following 
statistics relating to the area under different cropf. — 


Crops. 

Million Acres 

Rice 

803 

Wheat 

27 6 

Jowar 

21 4 

Other* food crops 

88*2 

Oil seeds 

17-6 

Cotton 

145 

i 

Other fibres 

31 

Fodder crops 

10-2 

Other non food crops 

39 
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Solution:— 

The Acreage ( million acres ) under different crops is 
shown by the simple bar diagrams: — 


o 
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H.S. >-3 Small diva. width of each bar and 

2 Small divs.-s= Spacing between two consecutive bars. 
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(2) Multiple Bar Diagrams: —These bar diagrams 
are constructed like simple bar diagrams. They represent more 
than one type of data at a time and so two or more bars ( as 
the case may be) are constructed at a time side by side. 


Example 13 (a) 


The following table gives the number of motor-cars 
produced in the three countries during the period 1929 —1935. 


Year 

Germany 

France 

UK- 

1929 

96 

254 

241 

1930 

74 

231 

241 

1931 

68 

201 

226 

1932 

50 

172 

248 

1933 

99 

189 

296 

1934 

172 

187 

355 

1935 ' 

245 

166 

417 


Solution: — 


Taking three bars at a time, side by side, we completed 
the diagram. Here the time ( year ) is taken along the X-axis 
and the number of cars produced in these countries along 
the Y-axis. 



1 Small div 
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Diagram No- 2 (a) 

•'Multiple Bar Diagram for No. of Cars 
Produced in U K., France & Germany” 



1929 1930 J93I 1932 1933 19."4 1933 

H.S. -*6 Small divs. = l year , 

2 Small divs.= width of each bar 
1 Small div.— spacing between tr.ch 
set of three bars 
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Example No. 13 (b):— 

i 

The following table provides the percentage of cultivators 
and percentage of cultivated area in different sizes of holdings 
in U.P. Depit the data in a bar diagram to the scale ? 

(M.Sc. Ag. 1964) 


Size of holding 
(acres) 

Pcrc.ntage 
of cultivators 

Percentage 
of area 

Up to 1 .. 

37-8 

60 

1-2 

18'0 

8.1 

2-3 

116 

8-7 

3-4 

8*1 

84 

4—5 .. .. 

5-7 

76 

5-6 

4-2 

6-8 

6-7 

3 0 

59 

7-8 *. ‘ 

23 

51 

8-9 

1-8 

44 

9-10 

1-4 

39 



Diagram No: 2 (b) 
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Solution:— 

The percentage of cultivators and the cultivated area in 
different sizes of holdings will be depicted b^ the compound 
bar-diagram. Corresponding to each holding size, two abjacent 
bars will be constructed where one of the two will represent 
the % of cultivators and the other % of cultivated area. 
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(3) Sub-divided Bar Diagrams:— These diagrams are 
suitable for the data given in the complex tables; where the 
magnitudes of the items are sub-divided into sub-classes. 

Example (14) 


The following table gives the populations of males out 
of the total populations for different years for a district. Draw 
a suitable diagram to show the populations given in the data. 


Year 

Total Population 

Population of Males 

1910 

85761 

45761 

1911 

92821 

47212 

1912 

83728 

41312 

1913 

72511 

37256 

1914 

83123 

42218 

1915 

, 65735 

34725 

1916 

46849 

23428 


Solution:— 


The bars are constructed for different years and their 
lengths are proportional to the magnitudes of the total 
populations. Then each bar is sub-divided to show the 
populations of males for the different years. 



Small div. — 3000 population 
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Diagram No* 3 

“Sub*dividcd bar diagram for total population 
' & population of males” 

from (1910-16) 
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Example (15):— 


The following table shows the expenditure incurred by 
the central government and the governments of States of type 
A & B in the 1st. five years-plan under the six major heads. 
Represent the data by means of a suitable diagram ? 


Expenditure in the 1st five years-plan (in crores of rupees) 


Subject 

C entral 

Agr.& development 

186-3 

Irrigation & power 

265-9 

Transport & 


Communication 

409-5 

Industries 

146-7 

Social Services 

191-4 

Miscellaneous 

40-7 

Totals 

1240-5 


Type A States 

Type B States 

127-3 

37-6 

206-1 

81-5 

56-5 

17-4 

17-9 

71 

192-3 

28-9 

100 

7-0 


6101 1732 


Solution:— 

The date can be represented by means of a sub-divided 
bar diagram. Three bars will be constructed and their heights 
would be in proportion of the total expenditure of the three 
types of governments respectively. Further each bar will be 
sub-divided into six parts and the height of each part will be 
proportional to the expenditure incurred on it. 




1 Small div. = 2Q crores of Rs. 
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Agr. & Develop. 
Irrigation & power 

Transport & Comm. 
Industries 

Social Services 
MiscdlanC-ous 


C. GOV. G. ofT. A. G. ofT. B. 
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The diagram provides the information about the total 
expenditure of the central, type A and type B governments. 
This also gives the distribution of the totdl expenditure of 
three types of the governments with respect to the items of 
expenditure (Transportation, Agriculture etc). 


Education 


Example No. (16) 


Represent the data given below by a suitable diagram. 
The table gives the birth rates and the dca.h rates of the six 
countries of the world during the year 1937. 


Name of the Country 

Birth rate 

Death rate 

Egypt 

43-5 

27-2 

Canada 

198 

102 

United States 

170 

11-2 

India 

34-5 

224 

Japan 

30-6 

170 

Germany 

18-8 

11*7 


Solution: — 

These birth and death rates can be represented by sub- 
divided bar diagrams. Six bars are constructed with their 
heights proportional to the birth rates. Further the bars are 
Sub-divided into 2 parts, the lower one shows the death rate* 
and the remaining upper portion shows the difference between 
the birth and the death fates. 
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Showing the birth & death rates 
of 

six countries in 1937 



M 

1— 1 

3 

P 

n 

O 

T3 

Cu 
m • 

P 

p 

3 

p 

g 

0- 

o 

3 
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P 

3 


y 


c 

C/3 


H.S >2 Small divs.= width of each bar 

rt =Spacing between two 

consecutive bars 


62 Statistical Mathcds in Education 

c 

Percentage yb-divided bar-diagram — 


In this diagram, the total values are taken equal to 100 
and the component parts are expressed in percentages. As the 
length of each bar in this diagram is the same, so it cannot 
give the comparison between the absolute magnitudes of the 
components. But the relative changes in the component parts 
are satisfactorily compared. 

Example (17):— 


With the help of a suitable diagram, show the absolute 
as well as the relative changes in the students population of 
the college A and B in the different faculties in 1947. 


Faculty 

College A 

College B 

Arts 

350 

290 

Science 

500 

250 

Commerce 

650 

150 

Law 

200 

150 

Totals 

1800 

720 

Solution:— 



(a) For the 

comparison of the absolute changes in the 

students population, 

the sub-divided 

bar diagram will be a 

suitable diagram. 




1 Small div. — 30 Students 
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Diagram No. 6 (a) 

bowing the absolute changes in Students population in 1947 



College A College B 

H- S >5 Small divs. =width or each bar 
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(b) The relative changes in the component parts can be 
shevn by the percentage sub-divided bar diagram. For this 
purpose, first we change the absolute magnitudes into the 
percentages. 



College 

College 

Faculty 


(A) 


(B) 

absolute 

Relative 

absolute 

Relative 


magni- 

magnitude 

magni- 

magn itude 


tude 

in % 

tude 

m % 

Arts 

350 

ft oo X 1 00 — 

200 

US X 100 = 



19-44 


27-77 

Science 

500 

MHoXlOO- 

250 

MS x ioo= 



27-77 


34-72 

Commerce 

650 

!MoXioo= 

150 

-OJ1— fc 

to fun 
©!© 
X 

I— » 

o 

o 

II 



3613 


20-85 

Law 

300 

IMoXioo^ 

120 

MS x ioo= 



16-66 


16-66 

■ESkSSi 

.1800 

100 

.720 

■HiUMt 
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Diagram No. 6 (b) 
Showing the Relative Change in 
Students population in 1947 . 



Arts 

Science , 
Commerce 

Law 


H. S. .3 Small divs. = width of each bar 

4 " ” —spacing between two bars 

Example (18) 

For the following data, show by a suitable diagram the 
comparison between the relative and the absolute changes. 


Principal heads 

Year 1938 - 39 

Year 1939 - 40 

of Revenue 

(Lakh of Rs ) 

(Lakh of Rs.) 

Custom 

4050 

4588 

Central Excise 

868 

652 

Corporation tax 

204 

238 

Incomctax 

1374 

1420 

Other heads 

974 

1262 
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Solution' — 


(a) For the comparison of the absolute changes, the 
subdivided bar diagram will be a suitable diagram. 


Digram No- 7(a) 

Showing absolute changes in amounts incurred on various 
heads in 1938-39 & 1939-40 





Custom 
C. Excise 

Corp. tax 

Income tax 
Other heads 


H. S. »3 Small divs.= width of each bar=l year 

4 ” " = spacing between two bars 
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(b) For the relative changes in the component parts, the 
percentage sub-divided bar diagram will be constructed. For 
this pufposc, we change the absolute magnitudes into the 
percentages as shown below— 


Heads of 

Year 1938- 39 

• 

Year 1939 — 40 

Revenue 

Absolute 

magni- 

tude 

Relative 

magnitude 

in % 

Absolute 

magni- 

tude 

Relative 

magnitude 

«n % 

Custom 

4050 

543 

4588 

56 4 

Centra! Excise 

868 

11-6 

652 

8-0 

Corporation 

tax 

204 

2-7 

238 . 

! 

2-9 

Income lax 

1374 

18 4 

1420 

17*5 

Other heads 

974 

130 

1262 

152 

Totals 

7470 

100 

8160 

100 
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Diagram No 7 (b) 

Showing Relative changes ii amounts 
iucurred on various heads in> 
1938—39 & 1939—40 



Custom 
C. Excise 

Corp. tax 
Income tax 
Other heads 


H. S. >3 Small divs. — width of each bar=l year 

4 — Spacing between two bars 

Bilateral Bars: In this case, the bars arc drawn above or 
below the base line (ifi the case of vertical bars) and to the 
left and right perpendicular to the base line (in the case of horizon- 
tal bars). The bars above the base line or to the right of the base 
line are used for the-*f ve quantity and the bars below the base 
line or to the left of the base line are used for — ve quantity 
in the data at hand. 

Example ( 19): — 


The following table gives the number of houses in ten 
small towns of India during the census of 1941 and 1951. 
Town : 123456789 10 

No. of houses 

in 1941 : 200 390 400 480 520 300 400 280 750 570 

No. of houses * 

in 1951 : 250 340 420 495 530 290 380 250 710 520 
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Represent the increase or decrease in the number* of 

houses in 1951 in comparison to the census of 1941. 

0 

Solution - 

In some cases, there is an increase while in other there is 
a decrease. The arrangement i> made such that the toyvns which 
show an increase are written first and the towns which show 
decrease are written later on. Within these graphs, the towns 
have been arranged according to the magnitudes cf the changes 
(increase or decrease). 



No. of house v 


Town 

in 


Difference 


194. 

195t 

. 

1 

200 

250 

+50 

2 

300 

340 

+ 40 

3 

400 

420 

+ 20 

4 

480 

495 

+ 15 ‘ 

5 

520 

530 

+ 10 

6 

£00 

290 

-10 

7 

400 

380 

-20 

8 

280 

250 

-30 

9 

750 

710 

-40 

10 

570 

520 

-50 


Ill 11 IV* utagiowM l O \ 

houses) are shown by the bars drawn to th i right of the base 
line and those showing the — ve changes (tfrerease in the’no. 
of houses) are drawn to the left of the base line* 
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Example No (20):- 


Reprcscnt the following data by a suitable diagram 
showing the differences between the proceeds &: costs. 


Year 

Total Proceeds 

Total costs 


(in thousands of Rs.) 

(in thousands of Rs 

1955 

22*0 

19 5 

1956 

27*3 

21-7 

1957 

28*2 

3O-0 

1958 

30-3 

25-6 

1959 

32-7 

26.1 

1960 

33-3 

34 2 


Solution — 

First of all we take the differences of the total proceeds 
& the total costs and then +ve differences are written first 
and — ve differences after them. Within these groups, they 
have been arranged according to the magnitudes of the changes 
(increase or decrease). 


Year : 1950 1956 1958 1955 1957 I960 

Difference : -f 6'6 +5’6 +4 f 7 +25 -‘90 — 18 
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(2) Two Dimensional Diagrams : (Area Diagrams) 
In these diagrams, the areas arc proportional to the 
magnitudes of the data. The common diagram of this type 
are rectangles, pie and square diagrams. The square and pie 
diagrams serve the same purpose but the pic diagrams are the 
easiest to draw and they can be made accurately. Hence the 
pie diagrams are eommonly used in place of square diagrams 
(a) Pie/circle diagrams : When the differences 
between any two quantities to be compared are large, bar 
diagrams can not be rsed, as one of them will be extremely 
small. 
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and the other large. In such cases, the squares or Pic diagrams 
are used. For drawing the circles, the sq jar roots of the quan- 
tities arc taken as the radii of the circles and so the areas of the 
circles are proportional to the magnitudes of the data. 

(b) Sub-divided Pie diagrams : When we want to 
compare the totals as well as their components with one 
another, we may use sub-divided Pie diagrams. The total value 
is equaled to27r=360° and then component parrs are expre- 
ssed in terms of angles. Having determined the angles corres- 
ponding to different components, the circle is divided into 
different sectors on the various angles pre-determind. 

Example No (21): — 

Represent the following data by a suitable diagram — 

Countries Production of cane sugar in 1938-39 
in Quintals 0000’s omitted 


1. India 2750 

2. Java 1550 

3. Hawai 835 

4. Columbia 51 


Solution:— 


Here the difference between the quantities 51 & 2750 
is very large. Thus Pic diagram vrill be a suitable diagram to 
represent the data. Now we construct the following table — 


Countries 

Quintals ( 1 ) 

0000’ s 

omitted 

Square 

VOi'tS 

(2) 

N 

Radiiain 

inches 

1. India 

2750 

52-44 

1-05 

2. Java 

1550 

39-37 

0 79 

3. Hawai 

835 

28-90 

0 58 

4. Columbia 

51 

714 

014 


The column (2) gives the square roots of the figures 
written in column (1) and the column (3) contains the 
the numbers which are obtained by dividing the numbers of 
the column (2) by 50 which arc the radii in hichcs. The circles 
with these radii arc arianged in the desccivding oder. For 
find ing the scale, wc calculate the area of the 1st circle = 
ffX (l-06)*=3-l4x M025 sq. inche . 

Thus 3 14x 1 025 sq. inches=2750,0000 quintals 
1 sq. inch = 796,5000 quintals. 
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a* 

co 


CO 

O) 


to 

.3 

In 

o 

no 


0) 
• Pi 
Vi 


S o 

^ u 

£ -9 

S5 2 

f Vi 

<3 



Scale : 1 sq. inch=796,5000 quintals. 



Graphs & Diagrams 


75 


Example 22:— 

The allocations for Madras & U.P. States under the 
second 5 years plan arc as given below — 


Heads 

Madras State 

U P- State 

1. Agr. & Cumunity 

(lakhs of Rs.) 

(lakhs of Rs.]) 

Development 

35355 

6763-9 

2. Irrigation & Power 

7125-0 

8042-5 

3. Industry & mining 

15200 

1643.4 

4. Transport 

807-5 

1723-2 

5. Social Services 

4065-9 

6863-8 

6. Miscellaneous 

252-1 

272-8 


Draw a sub divided Pic diagram to compare the cost of 
development under each head in the two States? 

Solution:— 

The solution for the problem is given in the tabular form 
as shown on the next page. 




10 CO 


radii of the circles.Then the sectors on various angles are cut from these circles according to the angles 
shown in the table. In this w.»y,the areas represented by different sectors on a circle are proportional 
to the magnitudes of the components contained for a state as the main heads. Since 3'14x (1*3) 2 = 
5 31 equivalents tol7306 lakhs of Rs, so Scale is 1 sq. inche=3259 1 Lakhs ofRs. 
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Example (23):— 

Represent the data of the following table by means o 
a suitable sub-divided Pie diagram— 

Clearing house Statistics in 1940-41 and 1947-48 in 
certain cities — 



City 

Total amount (Rs.) 

1940—41 

1947-48 


Bombay 

80,232 

255,264 


Calcutta 

100,853 

259,996 


Delhi 

2,853 

12,646 


Kanpur 

1,920 

10,983 


Karanchi 

4,076 

27,481 


Lahore 

1,633 

4,954 


Madras 

10,865 

34,794 


Others 

4,228 

51.896 


Totals 

2,07,260 

658,014 


Solution: - 

The steps in the construction of Pic diagram arc given 
below — 

(i) First of all, calculate the square-roots of the total 
amount of clearing house returns for the two years i.e. calcu- 
late ^207260 and V658014 which arc 455 & 8al respectively. 

(ii) Divide these quantities (455, 811) by a suitable 
number to obtain the radii of f he circles. 

(iii) Then calculate the angles corresponding -,o the 
amounts for each city in the separate sessions. The calculations 
arc done in the following tabular manner— 
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Cities 

Year 1940—41 
Angles calculated 
( degree) 

Year 1947—48 
Angles calculated 
(degree) 

Bombay 

Calcutta 

Delhi 

Kanpur 

Karanchi 

Lahore 

Madras 

Others 

80232x360 1Q Q ., 
207260 

100853 X 360 _ 175 . 9 
207260 <J " 

2853x360 Kn 
207200 ~ 5 ° 

1920x360 q q 
07260 6 6 

4676x360 g 
“ 207260 

1633x360 

207260" 

10865x360 on 
- 207260 ~' 8 9 
4228x360 _-o 
207260 ' ° 

255264x360 

658014, ° 

259996x360 1/10 .o 
658014 = 142 3 

12646x360 ft0 
658014 “ 69 
10983x360 „ n 

“65sorr“ 0 
27481x360 ,, A 
658014 “ I5 ° 

4954x360 0 _ 
658014 ~ 2 ' 7 

34794 x 360 

658014 U 

51896x360 oe . 
658014 ~ 28 ' 4 

Totals 

360 

360 


Finally we divide 455 & 811 by a number say 400 to get 
1.14 and 205 as the approximate radii in inches and the 
sectors on various angles are cut from these circles. The area 
enclosed within each sector represents the magnitude of the 
amount returns in the specified years for different cities. 

Scale : 1 sq. inch =52389 Rs. 
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Diagram No 12 

representing clearing house statistics’ 

Y “ r ! 1S4 °-« Year : 1947-48 
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Example 24— 

The following table gives the total outlay on rural deve- 
lopmi nt proposed in the first Five Years Plan and its break 
down in*o major items. Give a suitable diagramatic represen- 
tation of the data. (M.Sc Ag.Agra, 1965) 


Item 

Amount 
(in crores of 
rupees 

Agr. & community development 

Irrigation 

Irrigation & Power (multipur- 
pose projects) 

n 

36043 

167-97 

26590 

127-54 

497-10 

17304 

339-81 

85*00 

51-99 

Power 

Transport & communications 

Industry 

Social services 

T\ 1 1 *1* * 

Rehabilitation 

Miscellaneous 

Total 

’ 2,068-78 


Solution:— 

The suitable diagram for representing the above data is 
sub-divided Pie diagram. The total area of the circle will repre- 
sent the total amount i.e. 2068-78 crores of rupees and its 
components will represent the various items of expenditure 
nndqr the plan. / 
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Item 

Amount (in 
crores of 
Rupees ) 

Angle 

Agr. & com. development' 


62-7° 

Irrigation & power 

167*97 

29-2° 

Irrigation 

265-90 

46*3° 

Power 

127-54 

22*2° 

Transport Sc communication 

497-10 

86-5° 

Industry 

137-04 

301° 

Social services 

333-81 

59-r 

Rehabilitation 

85-00 

14-8° 

Miscellaneous 

51-99 

9-0° 

Totals 

2068-78 

359-9°^ 

360 


Let the radius of the circle be 2", then i 1 r*~-Area 
or ‘ 7 ~ X 4 sq. inches-- 2068*78 co res of rupees. 


Scale : 1 sq. inch = 1 G4 56 crores ofrupas. 
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Scale : 1 sq. inch = 164 . 5 6 crores of Rs. 
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Rectangles t The rectangles are the two dimensional 
diagrams and arc used when the two magnitudes which arc 
related to a third one are to be represented on the same diagram. 
For example, the average yield of wheat multiplied by the 
total acreage under wheat represents the total production of 
wheat. To represent it diagramatically, one side of the rectangle 
is taken proportional to the average-yield and the other is 
proportional to the total acreage; the area of the rectangle gives 
the totat production of wheat. Sub-divided rectangles arc used 
when one of the magnitudes is divided into several 
components. 

Example No (25) 

An analysis of the monthly wages paid to workers in 
the two firms A & B gives the following results — 

Firm A Firm B 

No. of wage earners 80 100 

Average monthly wage (Rs.) : 52.5 47‘5 

Represent the above facts diagra'matically ? 

Solution:— 

Here the average wage multiplied by the total number 
of workers gives the daily paid roll. Hence the most suitable 
diagram for representing the above data is to construct the 
two rectangles, one for each firm. One side of the rectangles 
will be proportional to the average wage anti the other will be 
proportional to the total no. of workers employed in a firm. 
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Diagram No. 14 

Showing the average monthly wage and number 
of wage earners 



The family budget data, where we compare the 
absolute totals as well as the percentages of the various 
items to the total, is very suitably represented by the 
rectangles. In this- ca-se, the height of each rectangle is kept 
100 and the ividth is made proportional to the total 
magnitude. 

Example No. 26 

The following tabic gives the details of the monthly 
expenditure of three families. Represent them by a suitable 
diagram— 


Item of the 

Family A 

Family B 

Family C 

expenditure 

Rs. An. 

Rs. An. 

Rs. An. 

Food 

12—0 

o 

1 

o 

28 - 

-0 

Clothing 

2—8 

8-0 

14 - 

-0 

House rent 

2—0 

4-0 

7 - 


Education 

/ 1 — o 

5—0 

7 - 

0 

Miscellaneous 

2-8 

00 

l 

14 - 

■0 

Totals 

20—0 

60-0 

70 - 

-0 
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Solution: — 

First we prepare the following tabic: 


Item of 
expenditure 

Family A 

Actual 

expen- 

ses 

0/ 

/<) 

Food 

12*00 

oo-oo 

Clothing 

2 ■ U 

12-50 

House rest 

2-00 

1000 

Education 

TOO 

5-00 

Miscellaneous 

2-50 

12-50 

T otal 

20-00 

100 


Family B hnnly C 


Actual 


Actual 


ex pen 

n ' 

/O 

cxpci j- 

O ' 

/O 

SOS 


ses 


25 00 

50-00 

28 

40.00 

8-00 

MV00 

14 

0.00 

400 

8-00 

7 

10.00 

5-00 

to-o '■ 

7 

10.00 

8-00 

10-00 

1 14 

i 

20.00 

50 

100 

70 

100 


The heights of the three rectangles will be taken as 100 
for each and their width in tin ratio 20:50:70. Each rectangle 
will be further divided into component parts according to the 
figures in the percentage column. 
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The above diagram represents — 

(1) The actual expenses on each item by the area of the 
corresponding component part of the rectangle. 

(2) The % of the item to the total by the height of the 
corresponding component part of the rectangle. 

* (3) The total expenses for the family by the whole area 
of the rectangle. 
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Example (16): — 


Represent the following data by a suitable diagram — 


Year 

Average 

Total yield 

' i 

Area under crop 

Total 

yield 

Md/acre 

Irrigated 

Non -Irrigated 

(acre 

age) 






45 

1940- 

41 

9.5 

427.5 

22 

23 


1941- 

42 

8.4 

361.2 

23 

i 

i 

20 

43 


Solution:— 


Here we note the relation— 

Average yield X total area^=total yield. 

So the rectangles are the suitable diagrams to represent 
the given data. Here we construct the two rectangles on the 
same horizontal axis taking their width in the ratio 
and heights in the ratio 45:43. Each rectangle will be further 
divided into two components according to the irrigated and 
non-irrigated areas. 
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Diagram No. 16 


Shewing the average yield md/acre & area under crop 
irrigated and noil-irrigated 



US. *1 Small div =l Md./acre 

Three dimensional Diagrams '• When the ra io 
between the two quantities to be compared is so large that the 
two diamentional diagrams are not suitable for representing 
them, we may represent them by three dimensional 
diagrams. The volumes of the diagrams represent the magni- 
tudes of the quantities to be compared. In this type of diagrams 
the spheres, cubes, and the prisms arc common but cubes arc 
generally drawn in practice. 
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To calculate the arithmatic means of the milk yield, »we 
prepare the following table — 


Milk yield (1) 
X 

(in Kgm.) 

No. of cows (2) 
f 

(3) 

fx* 

11 ! 

i 

6 

66 

14 

9 

126 

17 

12 

204 

21 

16 

336 

25 

i 21 

525 

30 

9 

270 

32 

7 

1 i 

224 

Totals 

| 80=sf 

' 1751 = 5 ; fx 


In column (1), we put the variate values and in column 
(2) their respective frequencies against them. In column (3) 
we put the product of (1) & (2). For example, the first value 
66 of the col". (3) is the product of 11 & 6 which are the 1st. 
figures in the col n . (1) & (2) respectively. Similarly, the 
other figures of col". (3) are attained. Thus, wo get — 



sum of col“. (3) 1751 

sum of col n . (2) 80 


21-8875 


kgm s . 


This formula can be applied in a grouped frequency 
distribution after the classes have been replaced by their 
respective mid-values. The procedure of calculation will be 
clear from the following Exp.— 
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definite which can be determined mathematically 

(2) It should be based on all observations 

(3) Its calculation should not be lengthly and tedious. 

(4) It should be least affected by sampling fluctuations. 

(5) It must be capable of algebraic treatments i.c. if the 
averages of the component series are known then the average 
of the whole series should be expressible in terms of the averages 
of the component series. 

Relative merits & Demerits of different averages : 

(1) Mean (Arithmatic average) 

Advantages: — 

(1) It is readily understood and well defined. 

1 2) It is based on all observations. 

(3) It is easy to calculate. 

(4) In most of the cases, it is not affected by the samp- 
ling fluctuations. 

(5) It is capable of algebcraic treatments. 

(6) It gives weights to all items which are directly 
proportional to their $iz;cs. 

Disadvantages: — 

( 1) It some times gives values which may not be phy- 
sically possible e.g. the average number of eggs laid by a hen 
as 18*5 per month. 

(2) It gives undue weights to the extreme items. 

(3) It can not be calculated in the cases where the 
extreme ends are open. 

(2) Mode : 

Advantages:— 

(1) In most of the cases, it is easily found. 

(2) It can be found out from the graph merely by an 
eye inspection. 

(3) It can be found out for the distributions where the 
ends are open. 

(4) It is the type that to the ordinary mind, seems to 
be the best to represent the group. 
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Disadvantages:— 

(1) No weights are given to the extreme items. 

(2) A clearly defined mode does not always exist 

(3) It is not capable of algebraic-treatments 

(3) Median : 

Advantages: — 

(1) It is well defined and can be calculated easily, 

(2) It docs not give undue weights to the extreme items. 

(3) It is possible to calculate even in the cases where the 
intervals are open. 

Exp. No. 2 

What is an arithmatic average ? What arc its properties ? 
How would you calculate the Arithmatic Average ? 

Sol: 

The arithmatic mean of a scries of individuals is obtained 
by adding up all the values ind then dividing the total by 
the number of individuals. * • 

For example, if the heights of 10 randon lv chosen 
plants arc-52", 55", 57", 61 \ \iV 65", 67", 68", 70", & 71", 
then the arithmatic mean or simply the mean of this sample 
of 10 plants will be the sum (52 | 55 j- + 7 1 ~= • * 30) 

divided by 10. i.c . mcan-.^ AM" 

In general, if x 1? x 2 , , x n are the values of n 

measurements, the arithmatic mean X of the X s is denned 
by the relation — 

~X = X l + X 2+ r X n jjv 


n n 

where x lf x 2 , , x n are the given values 


n = numbcr of items 

^ — the sum of 

X-- Arithmatic mean of 

Now, the formula (i) will be read as *‘The arithmatic 
mean of x\ is the sum of x’ s divided by thei number. 9 ’ 
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Exp- No. 5 

If x lt x s , x n be the ‘n’ values of any measurement 

with their respective frequencies £ v f a , .. ..f„ and mean X. 
then show that 

X=Af^, 

u 

where A is the assumed mean and e 
d is the deviation of x from A 
i.e. d = (x — a) 


Sol — 


We define 


ifx 


_ $f(x— A+A) 
n 

_sf(x— A)_^Alf 
n n 

_v f(x— A) +A _n_ 

11 ’ 11 

• x=A+-fc^l (i) 


or X= A+ 


ifd 


In future, for the computation of mean, we shall apply 
the above (i) formula which reduces the bulk of calculations 
and hence called the short cut formula for mean. 

Exp. No. 6 

Using the short cut method, compute the mean of the 
following data— 

(a) No. ofyfeanches/ 

platt txr 5 10 15 20 25 30 35 40 45 50 

No. of plants (f) : 20 43 76 67 72 45 39 9 8 6 

(b) Age (years) 0-10, 10-20, 20-30, 30—50, 50—80 

No. of pesrons : 61, 49. 40, 60, 23 



Chapter III 


Measures of Central tendency 
(averages) and dispersion 

The histogram or frequency curve gives the general idea 
of the distributi on of the virute aid hence the frequency 
graph can be used to stu ly and compare the given distribu- 
tions. But the study through the graphs depends upon the 
accuracy and skill of the eye, which is an uneertiin factor. 
Thus the study and comparisons by it may not be very 
reliable. Therefore it is necessiry to know certain features of 
the distribution, which give an idea of the distribution and can 
be determined arithmatically. Two .such features of the distribu- 
tion are its average and dispersion (variation). 

Average : An average is the value of the variate which 
claims to represent the distribution. Some of the variate values 
will be above this value and others belo& it and so it is known 
as a measure of the central tendency. 

There arc three averages (i) Mean (ii) Mode and 
(iii) Median. 

Exp- No. 1 

What are the propc rtics of an ideal average. In what 
circumstances, would you consider the mean, mode and the 
median, the most suitable statistics to describe the central 
tendency of the distribution ? 

Sol: — 

An ideal average should have the following properties: — 

(1) It should be rigidly defined and its value should be 
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Exp- No. 4 

Given the following distribution, calculate the mean? 


Height of plants 
(in cms.) 

No. of 
plants 

Height of plants 
(in cms.) 

No. of 
plants 

125-175 

2 

37 5-42-5 

4 

17-5—22*5 

22 

42-5—47 5 

6 

22-5—27-5 

19 

47-5-52-5 

1 

27-5-32-5 

14 

i 52-5-57-5 

i 

1 

32-5-37-5 

3 




Solution: — 

vf x . . 

The mean is given by the formula X = ~^— » where the 
computations of ^tx~ n are given below in the tabular form-- 


(1) 

(2) 

(»> 

X 

f 

fx 

15 

2 

30 

20 

22 

440 

25 

19 

475 

30 

14 

420 

35 

3 

105 

40 

4 

160 

45 

6 

270 

50 

1 

50 

55 

1 

55 

Totals 1 

[ 72=Sf 

] 2005= 
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In the col”. (I) we have put tne mid-values of 1 the 
classes and in (2) the class frequencies respectively. Thus, the 
product of (1) & (2) arc put in the col“. (3) 

Therefore, sfx is the sum of all figures in the col". (3) 
and 2f=n. the sum fo all figures in the col. (2). Hence we have 

^ Sfx 2005 nri q i- 

X= — - f - r - = 27 , 8o centimetres. 

n 72 


For a short cut method tor computing the arithmatic 
mean, the formula is — 


X 



n 


where A is the assumed mean and 

d--(x--A) the deviation of the variate from the 
assumed mean. 

Properties of the Arithmatic Mean 

( 1) The sum of the deviations of the vetoes of x„ from 

their mean X is always zero i.e. if x x , x 2 x„ be the variate 

values and X be their mean, thei^Xj X),(x 2 — X ), .(x„ — x) 
etc. are their deviations form the nn»an and the sum of 
deviations is zero. i.e. v(x — X)— 0 

(2) [f X lf Xa» X k be the arithmatic means of k 

distributions with respective frequencies lij, n a , n k , then 

'the mean x of the whole disti ibution is given by— 

__ n iXid~ ~i~»kXit ^ n X . 

X— n ic 2n 

(3) If Xi» X 2 x n be the variate values with their 

mean X~ and a variate y=ax+b is obtained, then 7 =aX+b, 
where a & b are constants. 
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Exp -No. 3 

The yield of rices in 12 equal plots of a village are 
given as follows (in maunds) — 

68, 80, 94, 104, 120, 114 
125, 130, 130, 140, 141, 145, find the mean yield ? 

Solution:— 

If the yields are denoted by x s , then 

X=,g=??.+?Pt-‘.v:v.± 145 =1^=116 maunds/plot, 
n 12 12 

If the value x x occurs f t times, x 2 occurs f 2 times, and 
so on, then 

-_flX 1 + f 2 X 2 + +f n x n _vf x _vf x 

fi+f 2 + + t ' sf n 1 ; 

as Sf ; -n 

where X=Ar;thmatic average and x x , x 2 , x n a e 

the given values of the variate x ( . 

f lf f 8 , f„ arc the respective frequencies 

of x lt x 2 , x n . 

n=fj+f 2 + 4* fn — sf 


To make the procedure clear, let us take the yield of 
milk by 80 cows of a dairy farm on a certain day fin 

kilograms) — 

Milk-yield : 11 14 17 21 25 30 32 


No. of cows 6 9 12 16 21 9 7. 
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Solution: — 

(a) To calculate the mean of this data, first we perform 
the following table — 

A =25 - 



The assumed mean i.e A should be taken at such a 
value of x which divides the whole distribution approximately 
into two equal parts, so that some of the*d’ values be — ve 
and the others be -fve- Here, we have taken A = 25 and col*. 
(3) is obtained by subtracting A =25 from the figures of 
col*. (1) as the first value in col”. (3) is 5 — 25= — 20 and so 
on, the others. The col n . (4) is obtained as the respective 


product of (2) & (3). Then we compute 
n 


=25+ 


-1070 

384 


25—2-8=22-2 


j2=22‘2 no, of branches/plant. 
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(b) To calculate the mean of this data, the following 
table is performed — 

A=25 


(1) 


wm 




9 


5 

61 

-20 

-1220 

15 

49 

-10 

-490 

25 

40 

0 

0 

40 

60 

15 

900 

65 

23 

40 

920 

totals 

233 =Ef 


110=2fd 


The col®. (1) of this table has been obtained dy replacing 
the classes by their respective mid-values and col®. (3) is the 
result of subtracting A =25 (assumed mean) from the figures 
of col®. (1). The col®. (4) is the product of (2) & (3) col®s. 
with their respective figures. Then we compute the mean 


X 



=25 -f 


110 

233 


=25+0-47 


X= 25*47 years. 
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Example No- 7 

What is median ? Illustrate by an example, how would 
you calculate the median i 

Solution*. — 

Median : The median is defined as the value of a 
variate which divides the distribution into two equal parts. 
Thus, half of the values lie below the median and half above it. 

Calculation of Median : 

1 — (a) Simple series of odd items : 

* In a simple series, the variate does not repeat itself. In 
such a series, if the number of items is odd, the computation 
of the median is very easy. 

First we arrange the values either in the ascending or in 
the descending order of their magnitudes and then find the size 

of m h item in this series. It will be our desired value 

of the median. 

For illustration , consider the followipg example of 9 
items with magnitudes 2, 5, 7, 3, 11, 9, 8, 4 & 6, 

To find the median, the items are arranged in the 
ascending order of their magnitudes i. e. 

2, 3, 4, 5, 6, 7, 8, 9, 11, then size of^-Hii-^Ntem 

giv< s the median Therefore 

Md =size of ~ 1 


item 


=size of (£±!) th itcm=size of 5 th item 


=6 


/.Median =6 

I— (b) Simple series of even items : 

In this case, the median is the simple arithmatic average 

ol the size of ^ item and + 1 ^ ^ item when the 
values have been arranged either in the ascending or in the 



100 


Statistical Methods in Education 


descending order of their magnitudes. For example, take the 
case of finding the median from the data of eight ( even 
number ) values — 

62,65, 67, 64, 72, 53, 59, 55 
The ascending arrangement is 

53, 55, 59, 62, 64, 65, 67, 72. Tjien the size of 

item= size of i.e. 4 th item=62 

& size of item— size of 5 th item ^-64 

Hence, Md. = Simple arithmetic average of the size of 

(f) d M-t +1 ) ,h “ 

62+64 

= __=63 

/.Median =63 

Ungrouped data of discrete variate: 

If a variate x x repeats fj times, x a repeats f a times and so 
on i.e. if we deal with a frequency distribution of the 
following type— 

Variate value (x) frequency (f) 

Xi f x 



x, f n .then there are 

following steps in the calculation of the median. 

(i) Arrange the items either in ascending or descending 
order of their magnitudes, if they are not so. But usually in 
sud\ a case the data is already arranged and a ] fresh 
arrangement is rarely required. 

(ii) Compute the cumulative frequencies of the variate 

values. 
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(iii ) locate the median item which is whcie 

n is the total frequency, 

(iv) Find the value of the median which is the size of 
the (. ”+ly h item. 


Consider the example given below to compute the 
median— 

x: 1 2 3 4 5 6 

f: 20 15 17 19 13 15 

For the computation of the median, wc proceed as 
follows — 


X 

f 

c-f- 

1 

20 

• 20 

2 

15 

35 

3 

17 

52 

4 

19 ‘ 

71 

5 

13 

84 

6 

16 

99 


Since the median is the size of the item, 

the median here, is the size of ■= 50 tb item. 
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By j|pjj$cting the col n . of cumulative frequencies we note that 
the § 0 ® item falls opposite the variate value 3 and so the 
median here is 3. 

In case, when the total of all frequencies is an even 
number, the median will be the mean of the size of 

(i ) tl,an,1 (i +l ) th,te,m - 


3— Grouped data of discrete & continuous variate: 


In the case of the grouped data, the arrangement of the 
items according to their magnitudes is already done. What is 
cquired next to be done, is to calculate. 

(i) The cumulative frequencies, 


(ii) To locate the median-class, which is 





(in) To determine the value of the median by applying 
the formula— 


Md=L+2i-ix i 

t i. n-f 1 

where Md ►median, m= — g- 

L ►lower limit of the. median class 

i ►class-interval of the median class, 

f ►frequency of the median class 

c ►cumulative frequency of the 

class following the median class 
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For illustration, consider the following frequency 
distribution to find out the median — 


class 

frequency ( f) 

0-5 

4 

5—10 

6 

10—15 

10 

15—20 

16 

20-25 

12 

25—30 

8 

30—35 

4 


To calculate the median for the above data, first we 
prc pare the following table for c.f. 


class 

frequency 

<f) 

1 ■ 

c* f • 

0-5 

4 . 

4 

5-10 

6 

10 

10—15 

10 

20 

15—20 

16 

36 

20—25 

12 

48 

25—30 

8 

56 

30—35 

4 

60 
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Here, we want the value of( V h =30.5 tt item, 

' mm 

which is 1 ocated in the class 15—20. 
determined by the formula— 

r 

Then the median is 

Md-L+i2=Sxi 

where m«= ?~r*=30.5 
2 

= 15+ 30 ' 6_20 x5 

L=15 

c=20 

= 12+3-28 

f=16 

median= 18*28 approximately. 

i=15 


Example No. (8) 

What do you mean by the Quartiles ? Calculate the 
median, the upper and the lower quartiles from the data given 
below— 


% of recovery 
of sugar 
on cane 

No. of 
factories 

% of recovery 
of sugar 
on cane 

No. of 
factories 

8.0— 8.2 

2 

( 

9.4— 9.6 

10 

8.2— 8.4 

5 

9.6— 9.8 

7 

8.4— 8.6 

4 

9.8— 10.0 

6 

8.6— 8.8 

11 

10.0-10.2 

3 

8.8— 9.0 

1 1 

10.2—10.4 

1 

9.0— 9.2 

1 

10.4—10.6 

t \ /" t 

1 

9.2— 9.4 

13 

» 1 


' Total 



85 
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Solution— 

As the median is such a value of the variate which 
divides the whole distribution in such a way that half of the 
observations lie below it and the remaining half above it when 
the data is arranged in the ascending or descending order of 
magnitudes. 

Similarly, the lower quartile denoted by is the 
value of the variate which divides the distribution in such a 
manner that one quarter of the observations lie below it and 
and the remaining 3 quarters above it when the values have 
been arranged in the ascending order of their magnitudes. 

The upper quartile denoted by Q 3 is the value of the 
variate which divides the distribution in such a way that 3 
quarters of the observations lie below it and 1 quarter above it 
when the data have been arranged in the ascending order of 
their magnitudes. * 

Clearly Q 2 is the median Q 1( Q 3 and median, are also 
called the partitioning values of the distribution. Other 
partitioning values are Pet tiles, Deciles and Hectiles. 

The mathematical formula for the computation of 
quartiles for the grouped data are — 

Q^L+aLZ- X i 

where L - --*• lower limit of Qj class 
n+l 
q i^"4“ 

c >c.f. of the class following the, Qj 

class 

f ► frequency of the Qj class 

i ►class interval of the Qj class 
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Q 3 =L+ 3 : t-'X i 


and 

— t -L-3»H- C ' 

where q a — 3 ( n +l) 

L ► lower limit of Q 3 class 

f- -*• frequency of Q 3 class 

i »-class interval of Q^, class 

c >c.f. of the class following the Q 3 

class 

The Qt class lies opposite of c.f. and Q 3 class 

lies opposite of-^-^i^c.f. For the present example, we have 
the following table — 


% recovery 
of sugur 
on cane 

X 

No. of 
factories 

(f) 

c.f. 

00 

0 

1 

00 

2 

2 

8 - 2— 8*4 

5 

7 

00 

£ 

1 

00 

d> 

4 

11 

00 

oo 

i 

CO 

oo 

11 

22 

O 

i 

00 

00 

11 

33 

90-9*2 

11 

44 

92—9-4 

13 

57 

9-4- 9-6 

10 

67 

9-6— 9-8 

7 

74 

9-8—100 

6 

80 

100—10-2 

3 

83 

10-2-10-4 

1 

84 

10-4—10-6 

1 

85 
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, 85+1 

Md=L+^- C Xi where m=-^ ^ 

Hence the median class is 9’0— 9 2 and so L=9*0, 
= 0 2, f=ll, c-33 

Md-90f 4 -^- 3 XO*2 

= 9 - 0+01818 


median— 9* 1818 

T I Cjj " C ^ ^ 

The lower quartile Qi=Li Xi 


where Q 1 = 85 + 1 =21-5 


Hence the lower quartile class is 8*6 8 - 8 and so L 8 6 

i=0-2, f=ll,c=ll 

... Ql =8-6+ 2 1 ' 5 ~ — X02 

=8 6+0*1818 
Q 1= 87818 

The upper quartile Q 3 =L+ < l2-£— Xi 

where q 3 =^+LW5 

Hence the upper quartile class is 9*4— 9 6 and so 
L =9 4, i=0'2, f=10, c=57 
„ n , , 64 5—57 V f,.9 

••• Qs^ 9 ' 4 + — lo — x ° 2 
=94+015 
Q,=9-55 
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ExampleNo. (10) 

The classification of 75 cows has been done according 
one day milk in the following table. Find the median ? 


class interval 
(Milk in lbs) 

No. of cows 

8-10 

4 

10-12 

8 

12-14 

12 

14-16 

25 

16-18 

15 

18-20 

8 

9 

20-22 

3 


T 
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Solution:— 

We first calculate the cumulative frequencies which are 
given in the following table— 


class 

interval 

to 

frequency 

(O 

c .f. 

8—10 

4 

4 

10—12 

8 

12 

12—14 

12 

24 

14-16 

25 

49 

16-18 

15 

64 

18-20 

8 

72 

20-22 

3 

75 


The median is computed by the formula— 

Md— L-| — y~ xi 


where m=?Lt 1 =^±l = 38 
Z z 

Hence the median lies in the class 14—16 and 
L=14, f— 25, c=24, i=2 

Md=14+?^- 4 X2 

=14+1*12=1612 

Median=16'12 


so 
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Example No. 11 

The following tabic gives the marks obtained by a batch 
of candidates in a certain examination of Hifory and politic*. 
In which subject is the level of knowledge of the candidates 
higher ? Give reasons ? 


Roll 

No. 

History 

Politics 

Roll 

No. 

History^ 

t 

Politics 

1 

1 

42 

46 

9 

40 

30 

2 

24 

20 

10 

62 

61 

3 

38 

41 

11 

55 

50 

4 

35 

43 

12 

54 

63 

5 

30 

25 

13 

52 

45 

6 

45 

54 ^ 

14 

47 

56 

7 

58 . 

47 

15 

43 

58 

■ 





(M.Sc, 

8 

to 

36 



Agra 






1955) 


Solution — 


To find out the subject in which the knowledge of 
students is higher, we calculate the medians of the marks 
obtained in the two subjects. In the subject, for which the 
median is higher, the level of knowledge is higher. 

For this, we arrange the marks in the ascending order of 
magnitudes in each subject. 

History : 24, 30, 35, 38, 40, 42, 43, 45, 47, 50, 52, 54 ,55 

58, 62 

Politics : 20, 25, 30, 36, 41, 43, 45, 46, 47, 50, 54, 56, 58, 

61, 63 
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The median will be the size of 


m' 


=size of 8 th item, since n=15 
Median for History-marks =45 
Median for Politics-marks=46 

Hence, the level of knowledge of the candidates is higher 
in Politics than in the history. 

Example No- (12) 

Define Mode ? Illustrate its method of calculation ? 

Solution- 

Mode t The most frequent or the most popular value 
of a variate is called the mode. It is repeated at the greatest 
no of times. In popular language, when we speak the average 
student or average rent, we generally imply the modal student 
or the modal rent. It is easy to locate, as it lies at the highest 
frequency. But if there are irregularities in the frequency 
distribution the position of the mode could become indefinite. 
In such cases, the process of grouping Will be applicable. I:i a 
discrete scries the size of the variable \Vhich has the mix, 
frequency is the mode; while in a continuous series the mode 
will b: located by interpolation in the modal group by the 
formula 

Mo=Ld ^ Xi 

A x +A 2 

where L means the lower limit of the modal class 

Ai ►Stands for the difference between 

the frequency of the modal class and 
that of the class which follows the 
modal class 

Aj ►Stands for the difference between the 

frequency of the modal class and 
that of the the class which preceeds 
the modal class. 

i ►Stands for the class interval of the 

modal class. 
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Let us consider the following age-distribution of the 
candidates appearing at the matriculation examination of 
Patna University in 1937 — 


Age-group 

(years) 

No. of stud. 

Age-group 

(years) 

No. of stud 

12—13 

5 

17—18 

980 

13—14 

48 

18-19 

981 

14-15 

189 

19-20 

794 

15—16 

303 

20-21 

515 

16-17 

522 

21—22 

474 

Total 

i 


4811 


Here the modal class is 18-19 corresponding to the 
highest frequency 981 and so, 

L=18, i= 19-18=1, A x = 981-980=1, A*=981— 

794=187 


Mo=18+ 


vl 

1+187 X 
=18+006=18 , 005 
/. Mode= 18005 years. 




Measures of Central tendency (averages) and dispersion 113 

ExampleNo. (13) 

Define dispersion ? What are the different methbds of 
measuring it, describe each briefly ? (U.P. Board, 1962) 

Solution:— 

Having been known the value of the average of a series, 
we try to comput some statistics which can give an idea how 
the values of the variate are scattered around the central value. 
This variation of the data from the central value is called the 
dispersion or Scatter or Spread or variability . 

Thus, the dispersion can be defined as the extent to 
which the magnitudes of the items differ from the central 
value. 


The following are the measures of dispersion — 

(i) Range 

(ii) Quartile deviation 

(iii) Mean deviation from (a) average (mean) 

(b) mediip 

(iv) Standard deviation 

(i) R ang e : It is defined as the difference between the 
greatest and the least magnitude of the items. It is very easy to 
calculate but its use as a measure of dispersion is very rare. 
Because it docs not depend upon all the observations (items) 
and in its computation no consideration is given to the value 
of the central tendency. 


(ii, Quartile deviation : (Semi Inter quartile range) 
It is half of the quartile range i c. 


Q.D.= Q3 ^ 

It is a better measure 
oftenly used in elementary 
incapability of algebraic 


of dispersion than the range and is 
descriptive statistics. But due to its 
treatments, it is not used in the 
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Advance-theory. Another reason is that it gives an idea of 
dispersion only of those items which lie between & Q, and 
so most of the items have no effect on the computation of 
dispersion. 

(ill) Mean deviation (average deviatson) : As the 
computations of range & quartile deviation do noj depend 
upon all the observation, so these measures of dispersion can 
not be said as the satisfactory measures of dispersion. One 
measure free from this objection is the mean deviation which 
is defined as the aiithmatic average of the absolute deviations 
of the items from any measure (mean or median) of central 
tendency. Mathematically we write — 


Mean deviation= 


Sfldl 


9 


n 

where n is the total number of items, 

f — ►Stands for the corresponding 
frequency of the variate x 
and tdl — ► Stands for the absolute 
deviation of x from its average. 


If the deviations are taken from mean, then it is called 
Mean Deviation about mean and if deviations are taken from 
the median, then it is called Mean Deviation obout 
median . The mean deviation also suffers from one drawback 
that is of incapability of further algebraic treatments. 

For the illustration of the procedure, we calculate the 
mean deviation about mean from the following data — 

x: 2-5, 75, 12*5, 17-5, 22-5, 27*5, 32 5 
f: 4,. 6, 10, 16, 12, 8, 4 
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To compute the mean deviation about the mean, we 
form the following table — 


(1) 

X 


(3) 

fx 

(4) 

d-=x-M 

(5) 

Ml 

(6) 

f|d| 

2-5 

4 

100 

-15-5 

15-5 

6 40 

75 

6 

450 

— 10-5 

10 5 

63 0 

12*5 

10 

1250 

-5.5 

6-5 

1 

55-0 

17-5 

16 

i 280-0 

-0-5 

0.5 

80 

22-5 

12 

270-0 

+ 4-5 

4 - 5 

540 

27 5 

8 

220 0 

+ 9-5 

9-5 

76-0 

32 5 

4 

1300 

-f 14 5 

14 5 

58-0 

f otals 

U-i 

w 

II 

o 

CO 

1080 
=Sf x 



3760 

=sf|J| 


Mean X or M= 


Sfx 

n 


1080 

60 


= 18 


First of all, we calculate the mean X — 18, then we take 
the deviations of all values of the variate of col". (1) from their 
actual mean 18 as shown in col-' (4), by *d\ Again in col*. 
(5), we write the absolute values of ‘d and in (6) th the 
product of (2) & (5) with respective figures. 

Mean deviation__ sf |d |_376_ fi . 9( .fi 
(about mean) n 60 

If we want to calculate the mean deviation about the 
median then first we find the median for .the data and then 
tarke the deviations of the values from this median. Similarly, 
the mean deviation about any average can be found 
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by first computing that desired average for the given data 
and then obtaining the deviations of the values from that 
average. The rest of the procedure is the same as indicated 
above. 

(iv) Standard deviation : Among all the measures 
of dispersion, the S.D. is most widely used because of the facts 
that 

(1) its computation is based on all the observations, 

(2) the deviotions arc taken from the mean and so the 
sum of squares of the deviations is minimum, 

(3) t is capable of algebraic treatments. 

The S D. is the square root of the mean of the squares 
of deviations from the arithmatic mean. Symbolically, the * reek 
letter sigma (a) denotes the siandared deviation. 



We can take the deviations from an assumed mean also 
inplace of the actual mean for our convenience. Then the 
formula for S. D. will be 




frequency distribution. 


For illustration, we summarize the steps in the 
computation of S.D. by taking the following simple series of 
values — 


X :,192, 288, 236; 229, 184, 230, 348, 291. 330, 243 

(1) Choos the assumed mean 'A*, 

(2) Take the deviations ‘d’ from this assumed mean, 
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3. Square the deviations /. e. calculate , d i ’. 

The results of the above steps for the given series are tabulated 


X 

> 

II 

on a- 
® II 

X 

1 

> 

d z 

192 

-68 

4624 

288 

+ 28 

784 

236 

-24 

576 

229 

-31 

961 

184 

- 76 

5776 

160 

0 

0 

348 

! 88 

7744 

291 

; 

-1-31 

961 

t 

330 I 

+70 

4900 

243 

— 17 

289 

Totals 

+ 1 1 

26615 

i-** 1 


The S. D. is given by 

-AT-W} 

=V(2661'5--01) 

— \/(2661 - 49) 
ox 5 1-59. 


To complete the S. D. in the case of frequency distribution, (he 
following steps are made— 

(1) Choose the assumed mean, ‘A\ 

(2) take the deviations d—(X—A) from this assumed mean, 

(3) calculate the squares of d, 

(4) compute the prroducts ( fJ) in a column, 

(5) compute the products ( fd' 1 ) in another column. 

For illustration of the procedure, we calculate the S. D.'from 
the following data— 

Height in cms. (X) : 60, 61, 62. 63, 64, 65, 66, 67, 68, 

No. of plants (f) j 2, 0, 15, 29, 25, 12, 10, 4, 3. 
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The computational work is done in the following tabular 
form— 

A*=64 


X 

' 1 

' d=X- A | 

1 * ! 

fd ~ i 

| fd % 

60 ! 

2 

l 

-4 

^ 16 ! 


32 

61 

0 

j -3 1 

! 

9 ; 

0 

0 

62 

15 

1 —2 

4 

- 30 

60 

63 

29 

' - . 

! 

1 

-29 

29 

64 

25 

0 

0 

0 

0 

65 

12 

1 1 ! 

1 ! 

+ 12 

12 

66 

10 

i i 

- j 

4 1 

+ 20 | 

40 

67 

4 

j 3 ! 

9 

+ 12 

I 

1 37 

1 

68 

3 

1 

4 

16 : 

! 12 

1 

48 

Totals 

100 

> 

| 

-H-E/rf 

257= 2 fd* 


The S. D. is given by — 

=V(- 25579) 
ssl’59 

Computation of the S. D. in the case of frequency distribution 
of the grouped data with unequal Interval. 

In this case, the classes are replaced by their mid-values and the 
rest procedure remains the same as explained above. 

In the case of a grouped data when the class-intervals are equal , 
« more convenient formula for calculating the S. D. is. 

a=ix w ^ ere ’ IS l ^ e c l ass *> nterva ' and 

This it known as the method of step-deviation . 
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The steps according to this method are given below— 

(1) choose the assmed mean ‘A\ 

(2) take the deviations d=(X— A) from this assumed mean, 

(3) divide the deviations by the class interval say i and call the 

/alue 

i 

(4) calculate the squares of 1 

(5) compute the products /£, and 

(6) finally compute the products /£ 2 . 

For illustration, consider the following example— 

Age group (X) : 20-30, 30-40, 40-50, 50-60, 60-70, 
(Years) 70-80, 80-90. 

No, of persons ( /) : 3, 61, 132, 153, 140, 51, 2 

The computational work for S. D. is shown in the following 
tabular form — 

A=55, i=10 


Age group 
(Years) 

“MhT 

,value 

I x 

/ 

d= 

(X-A) 

iv 

-- _i - - 

n 

fV 

20-30 

25 

3 

-30 

-3 

-9 

27 

30-40 

: 35 

61 

20 

_ 2 

122 

244 

40-50 

i 45 

| 

132 

-10 


-132 

132 

50-60 

! 55 

153 

o 

i 

0 

0 

0 

60-70 

, 65 

140 

4-10 

+»1 

+ 140 

140 

70-80 

! 75 

51 

4-20 

+2 i 

+ 102 

204 

80-90 

85 

2 

+30 , 

i 

4-3 ; 

+6 

18 

Totals 

| 

'542= S/j 

1 - 

-15 765=2/5* 


The S. D. is given by 
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= 10 x V(1 -41 14 - 0008)= 10 x VO '4106) 

= 10x1187 
ssll-87 

< 7 = 11 *87 approximately. 

Exp. 14. Prove that E.t 2 =^X 2 — (SX)* 

“ n 

Where .v is t lie deviation of an observation .v from 
the arithmatic mean x of the sample an/j n is the total 
number of observations in the sample. 

(M. Sc. Ag. Agra, 1963) 


Sol. We have x~X—x, where x= 


ZX 


2 X=n x 


.v 2 =(X--x) 2 
and E-V 2 -- S(X — x) 2 

= E[X 2 + X 2 — 2X x 2 ] 

=SX 2 -:-e x 2 — 2 xSX 

-ZX 2 -f « x 2 — 2w x 2 
=SX 2 -nx 2 

= sx ’-"- (■ r)’ 

I.v 2 =sX 2 -^ X J* 

Hence Proved. 

Exp. 15. Compute the S. D. of the following data by means 
of the formulae— 


(a) 


Xj* 

(b) 

n 

m 

and (c) 


(!"V 

n 

\ « / 

x : 

12, 13, 14, 

15, 16 

f : 

4, 11, 32, 

21, 15 
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Sol. : (a) For the computation of the S. D., the table is given 

below— 

X=15 


. x I 

f 

/X 

cl (X - x ) (X- 

-X ) s 

/(X X ) 2 

12 

4 

! 48 

-3 

9 

36 

13 

II 

i 143 

- 2 

4 

44 

14 

32 

448 

— 1 

1 

32 

15 

21 

315 

0 

0 

i 0 

16 

15 

240 

1 

1 

15 

17 

X 

| 136 

2 

4 

32 

18 

5 

90 

3 

9 

i 

45 

20 

4 1 

80 

5 

2^ | 

too 

Totais 

; E / - 

100 

v/X 1500 ! 

i 

1 

i 


304X- « 
2’/(X— ) 


*_^/X = 1500 

X L/ 100" 

Thus the S. D. is given by 


3 04, 

A a=T 74 approximately. 

(b) To compute the S. LX, the following table is framed— 


X 

/ 

x^ 

/X 

<M i 

X 

1 

12 

4 

144 

48 

576 

13 

11 

169 

*• 

143 

1859 

14 

32 

196 

44 8 

6272 

15 

21 

225 

315 

4725 

16 

15 ! 

256 j 

240 

3840 

17 

8 

289 

136 

2312 

18 ! 

5 

324 

90 

1620 

! 

20 

4 

400 

80 

1600 

Totals 

100==s f 

L fX— 1500 1 

1 22. 804= 27/ X 2 
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12s 


The S. D. is given by 

=V(228'04— 225 
o— \/3'04 —I'76 approximate!} . 

(c) The S. D. is computed with the help of the following table- 


A= 16 


X 

1 

d=(X- A) 

* 

fd 

j " 

12 

4 

— 4 

16 

— 16 

64 

13 

11 

-3 

9 

-33 

99 

14 

32 

-2 

4 

-64 

128 

15 

21 

-1 

1 

-21 

21 

16 

15 

0 

0 

0 

0 

17 

i 

| 

‘ 8 ' ! 

1 

1 i 

8 

8 

18 

5 

2 

4 ! 

10 

20 

20 

4 

4 

16 ! 

1 

16 

64 

i 

Totals j 

1 100 = 2/7 

i 

j 

” / 

/ 

“—100= 

Z/cl 

|404=S fd J 


=V(4’04— 1) 

-V3-04 

. o—V16 approximately. 

The calculations have been considerabely reduced by using 

= ) | an( i so > s called short cut formula. 
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Exp. 16 Calculate the median and the quartiles using 

(a) mathematical method, and 

(b) graphical method, of the following data — 


Class 

Frequency 

5-7 

4 

7-9 

6 

9-11 

10 

11 — 13 

12 

13-15 

6 

15-17 

5 

1719 

4 


Sol. : Fiist we prepare the cumulative frequency table 


Class 

frepuency 

c-f. 

5-7 

4 

4 

7-9 

6 

10 

9-11 

10 

20 

11-13 

12 

32 

13-15 

• 6 

38 

15-17 

5 

43 

17-19 

4 

47— n 


We have 

md=L+~y- y i 

where --- 24 

r 2 

So, median lies in the class 
11 — 13 

Md=ll +— r- 20 X 2 

=- 11 + 0-66 
Median =1166 


The lower quartile is, 

Qi=L+ 9 y- c x / where q^ n -~ = = 12 and so, 




-♦1 Small d,v.=2 units 
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=9+0-4 

=9-4 

Qi=94 

The upper quartile is, Q x / 

where # a =^”^-^=3x 12=36 and 

soQ,= 13+ ^i 2 x2=13+l-33=14-33 Q a = 14*33 

(b) To determine any partition-value /. e. median and quartiles 
etc., we first draw a c.f. curve and then draw perpendicular 

C. F. Curve (Ogive) 



Graph No. (1) 


lines to tfie _y-axis corresponding to the partition item /. e. w, <7i 
q t etc. as the case may be. Next, from the points where these lines 
cut the ogive, draw the lines perpendicular to the x-axis. The dis- 
tances between the origin and the foot of the perpendiculars on 
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the x-axis are the partition-values. In the present example, we 
take points at 12, 24, 36 on y-axis and from these points the 
lines parallel to the x-axis are drawn. The points are noted Where 
these lines. cut the ogive. Finally the perpendiculars are drawn 
from these points on the x-axis. The abscissae of these points give 
the desired partition-values. 

Mathematical value Graphical value 


Results : Q 1 =9 , 4 

> 

94 

M</=ll-66 

> 

11-6 

Qa= 14*33 

> 

14-3 


Exp. 17. (a) If _r ax\-b and variance of x is g 2 , then show 
that o v 2 —a 2 . g./. 

(b) If x and y are two independent variates with their 
respective variances o 2 and a 2 and a new variate z is defined by the 
relation z—ax+by, then show that G,. 2 -a 2 o x 2 -\-b 2 o v 2 . 

(c) If the standard deviations of the two variates x and y be 
0‘5 and 1*2 respectively, find the S. Ds. of the variates 

(x-y), (v+y) and (3x-4y) ? 


Sol. 


(a) We know that 

S(ax-|-/7) 2 p; (av-l-h)j- 
D ( a 2 x 2 -\-2 abx-\ b 2 ) p<vv^j 2 

=a lSx H- 2 «A — +b 2 - 


-i(T v 


i 


-h l -\-2ab 



G v * ~a 2 • < 7 t ~. Hence proved. 

(b) We have 

E (< 7 .v I /)! - ) 2 p ( r ? x + Ay ) j a 
(a 2 x* +b 2 y 2 + 2abxy ) ^ 
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+l> 2 (^) 2 +2a/>^M 

\ n 1 n n J 




variates. 


+2a *{?‘-*-*} 

a ~ • I- {2-vj— nmy) 

,= a.= -|-/>V+~E(v je)0--y) 
V-A'HV o 

Because 2 (*-*) (r y)0 as .v and y are independent 


<v-=«V+/>V- 
(c) V (at+;)- V (,\)+ v (i) 
— 0 - 5 ! |-2 


Hcncc prosed 

v. lien a- and j’ are indep. 
and a- 1, /■- 1 


S. D. (v-ty)-V(l-T) 

1*3 approximately. 

Similarly, V (.v-,) V (.v) i-V where’ a- 1, b- J 
' ~0*5-f- 1 ‘2 
= 1*7 

S. D, (jt— j)- \/(l *7) 

<t (*-»)= 1'3 approximately. 

Also, V (3.v— 4j)=9x V fjc)H-l6x V (y) 

=9x0-5 + 16x1-2 

=4-5+19-2 

=23-7 

S. D. (3*— 4y)=V(23-7) 

a i*a-4«)=4*86 approximately. 


is c. 3 
/- -4 
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Exp. 18. Write notes on : 

(a) Variant. 

(l>) Standard error, 

(c) Coefficient of variation 

So!, (a) Variance : It is the square of the standard 
deviation i.e. 

. , c „ v , X(X -x)- 
Variance -(S. D.) _r = — 

v ii 

where X -►represents the variate values 
x ->'he mean of the variate values 
n ->!he total number of items. 

Tims, variance is the aril lima tic mean of the squares of the deviations 
from the mean. It is denoted by a- and is widely used in the 
statistical analysis of field experiments. 


(b) Standard error : It is the standard deviation of any 
statistic calculated on the basis of cample observations. It is widely 
used in the testing of statistical hypotheses. The S. E. of the mean 


. S D. 
is given by , , 


where n is the total number cJi observations, on 


which the mean has been calculated i.e. 


(c) Coefficient of variation : The S. D. and other measures of 
dispersion are the absolute measures ot dispersion and are 
expressed in the same units in which the observations are given 
and hence cannot be used to compare the variations in the two 
given scries which differ in their units and averages. To compare 
the variations of two such series, we require some relative measure 
of dispersion. The coefficients of variation is such a measure 
which is defined as the ratio of the S. D. to the mean expressed in 
percentage. Symbolically, 

c. v.=^ X 100. 


This statistic was developed by Pearaon. In order to compare 
the variations in the two series, we compute the coefficient* «f 
variation for each of them. The series for which the C. V. is 
higher, will vary more than the other. 



iotals ovooo>j<^^ 4 xu)K)^ I Rounds 
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Exp. 19. The scores of two golfers for 10 rounds each are : 

A : 58, 59, 60, 54, 65, 66, 52, 75, 69, 52 
B : 84, 56, 92, 65, 86, 78, 44, 54, 73, 68 
which may be regarded as the more consistent player ? 
Sol. In the present problem, we have to compute the 
coefficients of variation in the two cases and then compare them. 
The one, which has less variability than the other, will be called as 
the more consistent player 7 

The computation is made in the following tabular form : 



=j?) 
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=V(52’6> — V(216'6) 

. =7*252 approx. — 14*717 approx. 

C. V..= ^ x 100 = x 100 C. V.,= 14 ^— X 100 

= 11-88 =21-024 

Sol. Comparing the two C. Vs., wc conclude that the player 
A is more consistant than player B. 

Exp. 20. (a) Explain the practical utility of the coefficient of 

variation in the Biological research ? 

(b) The average heights of ten years old and 18 years old 
girls were reported to be 74-4 and 161*0 cms. respectively with 
standard deviations 2 # 64 and 6*12 cms. respectively. Which data 
varies more ? (M Sc. Ag. Agra 1963) 

Sol. (a) It has been answered in the above question. 

(b) C. V. for the heights of 10 years old girls is 

2*64 

C - V -x=74 4 Xl° 0 
=3-548. 

C. V. for the heights of 18 years old girls is 
C. V. 2 =p 6 ^ 0 xl00 
= 3-801. 

Comparing the two C. Vs., we conclude that the heights of the 
18 years old girls show the greater variability than that of the 10 
years old girls. 

Exp. 21. A random sample from a biological population is 
given below : 


Observation j j 2 3 4 5 6 7 8 9 io 11 12 13 14 15 
No. 

No of grams/ 19 30 n 31 32 36 39 2 4 12 57 39 29 34 53 33 


In the above series, sum of all the observations is 479 and sum 
of squares of deviations from the mean is 2272-93. 

(a) Calculate the following of the above sample. 

Mean, variance, standard deviation and standard error. 

(b) What is an array ? Illustrate your answer with the series 
given above. 
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? V 


(d) In samples of 15-20 items, the range ..■> mi the average 
about 3*5 times the standard deviation. Verify this statement from 
the sample given above ? (.If Sc. 4g. Agra, 1964) 


Sol. (a) We has e 

ii~- 15 
XX -479 

X (X-x)'— 2272 * 93 . 

rV 47 Q 

Hence. Ill . mem (x)=- V =31*93.' -rai* s/e h,a I 

n la 


• , « XtX-x)- 2272*93 ... 

\ ariancc ('Tx**) r = — — == — ^ = la *.i 28 <’ 

=V( 151 *5286) - 12*309 


, _ , r . S. D. //variance\ 
. E. (of mew)- Vn ,J[- " ) 


52S6\ 


rr 

-VI-'I-II'.I 1 ') 

- 3*. 7 approx. 


(b) Array : An orderly arrangement of the variate values is 
called an array. If the variate values i.c. the n mber of grains/ 
earhead a ,- e arranged either in ascending or* descending order of 
their magnitudes, the resulting series will be called an array. The 
practical utility of an array lies in the calculation of partition 
values. 

(c) In the gi cn sample, the lowest number of gi.ms per. 
earhead is 11 and highest is 57, so the range will be given by : 

R- 57—11 - 46 gruins/c.irhcad 
We have the S. D.--- 12*309 
so 3*5 XS. D.=3*5x 12*309 
=43*08 

< 

which is nearly equal to the range value i.e. 46 grains/earhead. 

Thus, we can say that on the average the range is 3*5 times 
that of the S. D. in the sample given above. 
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Exp. 22. A random sample from a field experiment is given 
below : 


3bs. No. 112 3 456789 


10 11 12 13 14 15 16 17 


bits. of j 

Plants in >3 17 20 .9 18 22 16 25 13 15 19 21 23 20 21 22 16 
cms. 


l'i this s .mpii\ the sum of all the observations is 340 and the 
sum of n.u j of deviations from the mean is 254. 

(a) C * c ’ i • . * f c six statistical constants, ilncc each of measures 
of type and measures ot van ibibty. from the above c ump;c. 

(b) lvp ? ''i a 1 t •> n arm v ,th the h?lp ofth: above 

sanrph' ’> 4 : -• umm "i ; "y ' / «/ v* Ur:/. 1065) 

S. . 

// 17 

V\ 340 
r '7 \)“ - 254 

11'- , measures of central tend* n< y ai Mean. Median. 

lanM-.i.w * . . u i,i'. die M vie -eimn-t be fnmd in* the present 
example v, Mm. On Oeqimneics). 

n’V 340 

^ h n 7 he *•' ( s » ~ e.0’0 cms. 

The p'.'dl n t.m be found out by arranging the \ due. (heights 
of plants in i ms ) in ascending order of their m nitudes. The 
arrangen m-t wi'\ I c . s follows : 

H M 16 , lr>. r. 18,19, 20. 20, 21. 21. 22. 22, 23. 2\ 25. 29 

Md size of the ( n J item-- si/e of (— -~~V * item 

- -size of ?ih item. 

The size of the 9th item in the orderly arranged series is 20 
and so the 


Median —20-0 runs. 
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The three averages (measures of central'tendency) are found 
to be equal. 

For the three measures of dispersion, we take the range, the 
S. D. and the mean deviation (about the mean) as computed 
below : 

Range (R)= highest value- lowest value 
-29-13=16 cms. 

S. D. jf)"= v ' ( l««353) 

^=3*865 effis. approximately. 

Mean Deviation (M. &)=- — 

n 

where | d | means the absolute value of the deviation from the 
actual mean x i.e. | d | - .1 X-x I. 



To cdhipute it, we have to fram; the following tab’e : 
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Mean deviation ^3‘0588 cms. approx. 



Measures of Central Tendency (aVei^ges) and Dispersion 

Hence We have, the three calculated— 

MetisUres of average Measures of dispersion 

Mean =20*0 pms. Range= 16 cms. 

Median=20 , 0 cms, S. D.=3 - 865 cms. 

H. M. =19*9 cms. M. D.^S'OSSS cms. 

(b) The answer has been given in the previous question. 

Exp. 23. Account of the number of grains on each of fourty 
e^rheads of wheat gave the following results— 

32 23 25 19 26 17 31 23 

29 39 34 26 27 38 27 34 

33 27 29 27 31 38 37 39 

18 34 35 40 34 43 24 31 

28 43 20 33 28 34 29 28 

Present the data ip the form of a frequency distribution. 
Calculate the Mean and standard deviation of the distribution froth 
the grouped data and state the standard error of mean number cjf 
grains per earhead, (M. Sc. Ag. Agra 1958) 

Sol. The lowest magnitude is 17 and highest is 43. Taking 
the lower limit of the 1st class 16, and keeping ihe class-internal 
equal to 4, we shall obtain the following seven ^classes with they 
respective frequencies. The border lines items are placed in ti e 
classes where they are as the lower limits. 


Class 

Tally marks 

Frequency 

16-20 

III 

* 3 

20-24 

mi 

4 

34-28 

. mj n 

t 

7 

28-32 ’ 

mj in 

8 

32-36 

rai mj 

10 

36-40 

mj 

5 

40-44 

in 

3 




136 


Statistical Methods in Education 


To compute the mean, S. D. and the standard error af mean 
number of grains per earhead, we foim the following table— 

A =30 


Mid value X d=X- A 



Totals 


Mean is given by 


fd 

fd* 

— 3/6 

432 

-32 

256 

-28 

112 

0 

0 

40 

160 

40 

320 

36 

432 

+ 20 
=Zfd 

1712 

=Zfd 


X=A+-^— 

n 

=30+ ^ . 

= 30+0-5 
X=30-5 grains 

= V (42’80—0 - 25) Vi (42*55)- 
S. D.= 6*52 approximately. 

S. E. of mea n no. of grains/earhead is given by 
1 c c _S. D. _ 6-52 _6-52 ,. n , 

' S< K== yn~W0)~6T2 = 1 ° 3 ' 

Standard error =1-03 grains. *) 
Mfean=30’5 grains I 

S. D. == 6'52 grains y 

S. E. ojf mean no. of grains/earhead | 
a**i*03 graius \ J ... 
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Exp. 24. The length of 50 earheads of wheat are given below 
in centimetres— 


10*4 

10*8 

8*5 

109 

10 5 

9-4 

11*1 

70 

8*8 

106 

10-9 

11*2 

10*5 

9*7 

10*8 

84 

9*3 

11-2 

7*0 

11*3 

12-2 

8*9 

9*6 

110 

10*4 

9-9 

9*0 

10*5 

9*6 

11*0 

11*5 

9-7 

106 

10-4 

9*5 

8*4 

8*7 

9*1 

11*5 

10-6 

11*8 

7*8 

10*1 

10*1 

8*4 

11*0 

95 

9*4 

98 

11*7 


(a) Present the data in the form of a frequency distribution 
by choosing a suitable class-interval. 

(b) Calculate the mean and median of the distribution and 

represent it graphically. ( M. Sc. Agra, I960) 

Sol. (a) The lowest magnitude is 7*0 and highest is 1 2*2. 
Taking the lower limit of the first class 7 0 and keeping the class- 
interval equal to 0 8, we obtain the following seven classes with 
their respective frequencies written against them. The border 
line items are placed in the classes where they are as the lower 
limits— 


Class 

Tally marks 

*t 

Frequency 

6 

1 

do 

II 

2 

78-8*6 

fHJ 

5 

8*6- 9*4 

nil i 

6 

9*4-10*2 

rw mj m 

13 

10*2-11*0 

mi mj m 

13 

11*0—11 8 

mi mi 

9 

11*8-12*6 

ii 

2 

Totals 

50 

50 
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v 

(b) To calculate the mean of the distribution, we form the 
following table— 


mr 

Value 

X 

d^X-l 

/ 

fd 

7*4 

—2*4 

2 

—4.8 

8*2 

-1*6 

5 

-8 0 

90 

-0-8 

6 

— 4‘8 

9-8 

0 

13 

0 

10-6 

0-8 

13 

10-4 

11*4 

1-6 

9 

14-4 

,12*2 

2*4 

2 

4-8 

jrotals 


k/=5 0 

120 


Hence the mean is 

Zfd 


XP»A- 


n 

120 


= 98 + 50 
=98+0-24 

. x= 10-04 centimetfds 


For the computation of median of the distribution we prepare 
the following table— 


Class 

I freq. 
(/) 

c.f 

70— -8 

2 

2 

. 78-8-6 

5 

7 ; 

8*6— 9*4 

6 

13 

SM-10T2 

13 

26 

M*2-U$ 

13 

39 ■ 

s 


i 9 

48 

— - >■■■■' 

i 1 

30 


Thua the median, is 


MdU=L+~pX/ f 

where =25-5 


Hence. the median lies in the 
class 9-4— 1 0*2 and so 

L=9 - 4, c-13,/* 13, f-0'8 
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25*5 13 

.*. Md = 9‘4+ X 0-8 

= 9*4+0-7692 
= 101692 

Mds 10" 17 centimetres. 

Graphically the median = 10*2 cms. and it is shown in the 
adjacent graph. 



Graph No. 2 


Exp. 25. (a) In a series of 75 items, the otal of albthe items 
was found to be 114 55 and their sum of squares 175-7125. 
Calculate the mean and the standard deviation ? 

(b) For the marks obtained by 75 students in a certain test, 
the sum of the deviation from the assumed mean 17*5 was 330 
and the sum of squares of the deviations was 6250. Calculi tc the 
mean and the standard deviation ? 


Sol. 


(a) 


We are given n— 75 

E*= 114-55 
S-v 2 — 175*7125. 


it Moan 


S* 114*55 

T “TT" 


1'527 
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s. d. -(?)•} 

= >/(2-3429- 2-3327) 

=\/(0-0102). 

o =0-101 approximately, t 
(b) We have, n=75 

A=17-5 

Srf-330 

S</ 2 =6250 

Mean=A+~ 

Mean-- 17-5 -|-4-4= 21-9. 

->/(83‘3333- 19-36)= V(63-9733) 
a=8'0 approximately. 

Exp. 26. How many types of frequency functions do you 
know ; sketch their graphs also ? 

Sol. Following are the types of simple frequency 

distributions : 

(i) Symmetrical Unimodal curves : This type of the curve 
is symmetrical about its maximum ordinate. The class-frequencies 



SYMME TRICAL VM MODAL. 

CURVE 

Fig. \ 




Measures of Central Tendency (averages) and Dispersion 141 


go on decreasing symmetrically to zero on both the sides of the 
central (maximum) ordinate. A very important of this type is the 
normal curve. 

(2) Skewed Uni modal curves : In this type of. curves, the 
frequencies fall more rapidly on one side of the maximum ordinate 




Fig. 2 (a) 

than that of the other. Such curves have a tail on one side of the 
max. ordinate. If the tail 
is on the right hand side 
of the max. ordinate, the 
curve is siad to be the 
-f vely Skewed. 

When the tail is on • 

the left hand side of the Fig. 2 (b) 

max. ordinate the curve is said — vely Skewed curves. 

(3) J-Shaped curves : In this type of distributions, the 
maximum frequency falls on one end of the distribution; such as in 


NEG* tivE lV r, k e wt D t J'&vf 



the bank-balances and other income-distributions. They may be 
+vely. or —vely J-ahaped. 
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(4) U-shaped or anti- 
modal curves : In this 
type of curves, the fre- 
quencies start from a maxi- 
mum value and then fall 
to a minimum and again 
increase. These curves 
maybe symmetrical or may 
not be. Such type of 
distributions occur in the 
number of unemployed 
persons bv age-groups. 




Fig. 4 (a) 


Fig. 4(b) 
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EXERCISE No. (I) 

(Problems on C'h. I, If, III) 

Q. 1. The yield of rice in 55 equal plots of a village is* given 
in maunds as follows 

62 59 23 27 77 33 43 13 47 89 32 

63 24 17 72 30 45 42 8 26 36 21 

25 13 64 37 12 48 5 29 41 40 51 

35 67 36 18 28 92 92 2 20 55 27 

65 52 15 21 78 34 40 16 57 81 31 Prepare a 

frequency .table taking a suitable class-interval and draw the 
histogram, frequency polygon and the frequency curve. 

(U. P. Board, 1964) 

Q. 2. (a) What is a frequency distribution? What conside- 

rations are involved in forming a lrequeney table from a body of 
observational data on a quantitative character ? 

(M. Sc. Ag. Agra, 1956) 
(b) Make a freqcency table having grades of wages with class- 

intervals of five annas each from the following data of daily wages 
received by 40 labourers in a certain factory — 

5, 15, 30, 22, 30, 25, 40, 10 

6 20 15 25 32 22 20 10 

7 8 17 20 32 22 11 15 

8 II 11 22 8 35 25 35 

11 25 6 15 20 37 20 42 

Q. 3. Draw the ogive of the ^classified data obtained from 
Q. No. (2) (b) and find out median & quantifies from it, graphically ? 

Q. 3. In the following table, the heights of 2 plants are given 
in centimeteres at various ages — 


Age 

(days) 

Height of plant 

<A in cm. ! B in cm. 

40' 

153 j 

156 

60 ! 

167 t 

173 

80 

182 1 

180 

100 

187 

185 

120 

198 

195 

140 

201 

200 

160 

219 

219 

180 

220 

231 

200 

220 

231 


Draw the graphs for the 
growth of the two plants and 
comment on their growth. 
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Q. 5. (a) Reprtient the following data by a suitable 
diagram— 


Dose of nitrogen 
in Kgm./Hectre 

Height of plants in cms. after 

30 days 

70 days 

0 

28 

52 , 

22 

35 

64 

44 

40 

| 

75 

66 

38 

67 

70 

37 

63 


(b) The following table gives the total outlay on rural 
development proposed in the first live years plan and its break 
downirto major items. Give a suitable diagram to represent the data 


Item 

1 Amount (in 
crores of 

rupees) 

Agr. & community development 

371-43 

Irrigation * ... 

j 

178-97 

i 

Irrigation and power 

276-90 

Power 1 

138-54 

Transport and communication 

508-10 

Industry 

184-04 

Social Services 

35081 

Rehabilitation 

/ 

96-00 

Miscellaneous 

62-99 


Total 


216778 
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Q. 6. The consumer’s price-index and its three most 
important component indexes for a country from 1935 to 1944 are 
given in the table below : 


Year 

Consumer’s 
Price Index 

Food i 

Clothing 

Rent 

1935 

981 

100-4 | 

j 

96 8 

94 2 

1936 

99-1 

1013 S 

97-6 

964 

1937 

102-7 

105-3 i 

102-8 

100 9 

1938 

100-8 

97-8 

102-2 

1041 

1939 

994 

95-2 

100-5 

104-3 

1 

1940 

100-2 

96-6 

101-7 

104-6 

1941 

105-2 

105-5 i 

1063 

106-2 

1942 

llft-5 

123-9 

! 

124-2 

108-5 

1943 

123-6 

I 

! 1380 

129-7 

1 108 0 

1944 

125-5 

136-1 

138 : 8 

i 108-2 


Using graphical -method, comment particularly in respect of 
the periods 1937-38, 1939-40 and 1941-44. 

( M. Sf. Ag. Eco. Agra , 1965 ) 
Q. 7. The following data gives the marks obtained by a batch 
of candidates in a certain examination in Mathematics and 
Statistics. In which subject is the level of knowledge of the candi- 
dates higher ? Give reasons ? 

Marks in Maths . : 14, 16, 16, 14, 22, 13, 15, 24, 12, 23, 14, 20, 17 

Marks in Stat. : 21, 22, 18, 18, 19, 20, 17, 16, 15, 11, 12, 21, 20 

Q. 8. (a) Define the median and its utility ? Calculate the 

median from the following data : 

m -h t"- rn as 

Class Interval 77'?7°?7777*7 

cvirj-tor^ooocNmi^ 

*— < i— I *— < «~H 

42 48 47 40 41 21 5 2 2 2 


Frequency 
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(b) Also compute the quartiles and the quartile-deviation from 

the above data. (M. Sc. Ag. Eco. Agra, 1965) 

• 

Q. 9. The following table gives the yield of wheat from 10 
equal plots : 

Plot No. 1 2 3 4 5 6 7 8 9 10 Total 

Yield in kgms. 60 40 50 45 60 55 65 50 65 55 ■ i 45 

r 

If the area of each plot is 242 square yards, find the average 
yield per acre ? Also calculate the coefficient of variation (1 acre 
=*4840 sq. yds.). (U. P. Board 1963) 

Q. 10. The following table gives the marks obtained by some 
students. Calculate the arithmatic-mcan and the mode : 

Marks 0-10 10-20 20-30 30-40 40-50 

Frequency 3 13 18 12 5 

(U. P. Board, 1963) 

Q. 11. An analysis of the monthly wages paid to the workers 
in two firms A and B belonging to the same industry, gives the 


following results : 

Firm A Firm 

No. of wage earners : 586 648 

Average monthly wages : Rs. 52*3 Rs 47" 5 

Var. of the distr. of wage ; 1.00 1 2 1 


(a) Which firm pays out the larger amount as monthly wages 
to the workers ? 

(b) In which firm A or B, is there greater variability in 
individual wages ? 

(c) What is the measure of average monthly wages of all the 
workers in the two firms A and B taken together ? (I. A. S. 1951) 

Q. 12. Following is the distribution of marks secured by 1 39 
candidate;. Calculate the S. D. and the coefficient of variation ? 


M.ark» (jc) 

0-10 

8 

a 

<*> 

o 

T 

o 

a 

O 

VO 

o 

£ 

d> 

O 

oo 

1 

o 

Total 




(N 

rn 

V 

vn 

o 



• Frequency (/) 

5 

10 

20 

40 

30 

20 

10 

4 

139 
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Q. 13. Write short notes on the following : 

(i) _ Classification, 

(ii) Histogram, 

(iii) Ogive, 

(iv) Frequency polygon and frequency curve, 

(v) Median and Mode. 

Q. 14 . The scores of two golfers for 10 rounds each are : 

A: 9, 17, 14, 13, 15, 10, 11, 13, 13, 15 

B : 8, 15, 11, 11, 9, 12, 11, 10, 9, 14 

Which player may be regarded as more consistent ? 

Q. 15. The following table gives the heights and dry weights 
of 17 plants. Calculate the mean and S. D, for both the characters 
and find which of these two varies more ? 

Heights in cms. t.v) : 

50. 44. <4. 46. 64 37. 17. 62, 45. 6°, 44, 57. 63. 68. 45. 71, 50 
Weights in gms. (y> : 

44, 25, 44, 26, 41, 21, 60. 52, 15, 55, 42, 58, 37, 36, 40, 58, 36 

• 

Q. 16. In any two series, where d x and d t represent the 
deviations from the assumed means A! =50, A 2 =I54 have Hi — 10, 
tt 2 =9, Zd v ~ 5 8. Zdj 2 - 755-66, 3, Zd t ^ 551 1 7 0. Calculate 

the coefficients of variation for the two series ? 
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Answers for Exereise 1. 


Q. 2. (b) Class Freq. 

5-10 7 

10-15 6 

15-20 5 

20-25 9 

25-30 4 

30-35 4 

35-40 3 

40-45 2 


The boundary line cases are kept in .the classes where they 
are as lower imits. 

Q. 7. Statistics ; (median is higher). 

Q. 8. (a) Md.=53 08. 

(b) Q 1 =31'916, Q.,- 77-390, Q. D. =22-737 

Q. 9. 1090 Kgms./acre, C. V.= 14*47. 

Q. 10. A. M. =25-59, Mo.=24*545. 

Q. 11. (a) B, (b) B, (c) 49' 9 Rs. 

Q. 12. S. D. = 15-6, C. V.^39-2. 

Q. 14. B. 

Q. 15. $=55-65 cms.. y =40-59 gms., S. D. (x)=ll*71 cms., 
S. D. (y)=13-45 gms. weights vary more. 

Q. 16. C. V.i=17-1, C. V.*=50-7. 



Chaplet IV 

Elementary Idea of Probability 

4. | Meaning and Definition. 

The term piobability is used in three different meanings — 

(1) As the subject name, 

(2) As the numerical measure of the chance that an event 
will happen in a single trial, 

(3) As the idea of likelihood in daily conversation in the 
sentences like, 

(a) Most probably he will pass this year, 

(b) It is very likely that his father will come back today. 
The word ‘Probability’ used in the sense given in (2) is defined 

as the limit of the relative frequency as the number of trials 
increases indefinitely. If an event ‘E’ happens ‘m’ times in *n’ 
independent trials performed under the same conditions, then 
m . 

— is known as the relative frequency. In the table given below, 


we are giving the results of various throws of a symmetrical coin — 


No. of throws 

No. of times head 

1 

Relative frequency 

0*6— R.F| 

(n) 

turned up (m) 

(R. F.) 

100 

17 

4 2!=0*47 


100 


200 

95 

Sr 0 ' 475 

0*025 

300 

145 


0017 

400 

195 

S-* 

0*013 

500 

253 

?^=0*512 

500 

0*012 

600 

299 

^=0-498 

m 



600 

* 


From the table we note that as the number of trials increases, 
the deviation of the R. F. from 0*5 decreases. Thus, it is expected 
that the relative frequency will be very— very near to 0*5 for some 
| targe number of throws or in other words, we can say that the 






2 


Agricultural Statistics 


limit of the R. F. is 0 5 as the number of trials increases indefini- 
tely. Hence 0*5 is the probability of turning up the head which 
means that the head will turn up 50% of the total throws of a 
symmetrical coin on the a a erage in the long-run. Now, we can say 
that the probability is the numerical measure of the chance that in 
a single trial an event will happen and is defined as the limit of the 
ratio of the number of happenings of an event to the to'al number 
of trials performed under the [same conditions provided this limit 
is unique and finite. Suppose an event ‘E’ happens ‘a/ times in 'n' 

independent trials then the P(E)= , provided the limit is 

finite and unique and the trials are performed under the same set 
of conditions. 

P(E) is estimated by ^-’For example, if in a certian factory 

6% of the items are found to be defective on the average during 
an inspection of a large number of items, then the probability that 
an item selected at random will be defective is 5/100=0*05. 

4*2 Events : 

Compound Events : When an event is decomposible into a 
number of simple events, then it is called a compound eve nt. For 
example, thejevent, the sum of the two numbers shown by the 
upper faces of the two dice is six in the simultaneous throw of two 
unbiased dice, is a compound event. Since it can be decomposed 
into the fo.lowing simple events— 

(1, 5), (2, 4), (3, 3), (4, 2), (5 1), where the numbers in each 
bracket are shown by the upper faces of the 1st and 2nd die 
respectively. 

Independent Events : The events are said to be independent 
when the happening of one does not affect the happenings of tie 
others. In the reverse case, the events will be called the dependent 
events. For example, if a bag contains ten balls, and one ball is 
drawn from it and it is not replaced back, then a second ball is 
drawn from it ; the probability of the second drawing is dependent 
of the first and so the two draws (events) are dependent. On the 
other hand if the first ball is replaced, the probability of the second 
drawing is independent of the first and so the two drawings (ev nts) 
will be independent. 

Now we state without proof certain theorems which governs 
the probability of compound events— 

Mutually Exclusive Events : The set of events is said to be 
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mutually exclusive when tne happening of one excludes the 
happening of the other i.e. no any two events can occur simultane- 
ously. Tor example the two events, the head and the tail cannot 
occur together in tossing a single coin (here we have ruled 
out the possibility of standing the coin on edge). Similarly 
the six possible outcomes (events) in throwing a die are mutually 
exclu ive 

Equally likely Events : When the probability of happening of 
two or mjre events is the same, they are called equally likely events • 
For example, the six possible outcomes of a perfectly— symme- 
trical die tnd two possible outcomes of a perfectly symmetrical 
coin are equally likely events. Such a perfectly symmetrical die and 
coin cue called unbiased . 

Exhaustive Events : The set of events is said to be exhau>tivc 
when it includes all the possible outcomes of a trial and it is certain 
that one of them will occur. 

Compatible Events : The set of events is said to be compa- 
tible when two or more of them can happen simultaneously. 

4'3 Laws of Probabilities : 

(1) Addition Theorem of probabilities \ If E u E a 

are K’ mutually exclusive events with their respective probabilities 

of happenings p u p 2 p k , then the probability that any one of 

them will happen is (p x t-p 2 +...+/>*)• Symbolically, 

P(E x +E t +...+E*)=Z/>, 

i^ 1 

(2) Multiplication theorem of probabilities : IfE x , E a ...E fc 
are e fc’ independent compatible events with p u p 2 ~..*p k probabilities 
of their occurences respectively, then the probability for their 
simultaneous occurence is ( p l% p 2 . ../?*). Symbolically, 

k 

P (E|.E a ...E k)—Pi’P2”’Pk—irPi 

i= l 

(3) If E is the event opposite to E, then P (E)+P(-g )=1. 

Exp. (1) [a | A number is selected from each of the two 
sets 1 , 2, 3, 4, 5, 6, and 1, 2, 3, 4, 5, 6; what is the probability that 
the number 1 will be selected from the first set and 3 fro® the 
second ? ^ 

[b] What is the probability that the sum of the two numbers 
selected one from each set is 5 ? 

OR 

Two dice are thrown simultaneously 
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[a] What is the probability that 1 will be shown by the 
upper face of the first die and 3 by that of second ? 

[b] What is the probability that the sum of the two numbers 
shown by the upper faces of the two dice is 6 ? 

Sol : [a] There are six numbers in ths first set and in random 
selection for any one of them every number has an equal chance 

of being selected. Hence, P(l)= 

Similarly, 

p < 3 >- r 

The selection of (1, 3) is a compound event composed of two 
simple events which are independent* 

Hence the probability of their simultaneous occurence is— 

P (1, 3)=P (1) . P (3) (by the multiplication 

_ 1_ 1 1 theorem of probabilities) 

“ « tT — 3B An*. 

[b] The sum of the two numbers one selected from each set 
would be 6 in the following 4 ways— 

(1, 4), (2,3), (3, 2), (4,1). 

Arguing in the same manner as above, we have— 

P (1, 4)> L.p (2, j)_ i_ 

P (3, 2)= L, an d P (4, 1)= L’ 

Hence, the desired probability is— 

P [d, 4 )+ (2,3)+ (3,2)+ (4,1) ]= ij+^+^+lj 

_1=L A 
36 9 An *’ 

Exp. (2) Two cards are drawn from a deck of well shuffled 
cards. What is the probability that both the extracted cards are aces ? 

Sol : Since there are 5S cards in the deck and 4 are aces 
among them. In the drawing of the first card the probability that 

4 

the extracted card is an ace is ^ . After the first card has been 

dravm, the second extracted card may be any one of the remaining 
51. cards containing 3 aces. The probability of the second extracted 

card that it will be an ace is gj. Hence the probability that both 

the extracted cards are aces, is*- 
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LxL = 1 _vi = - 1 

52 51 13 17 221 Ana. 

Exp. (3) Two cards are drawn from a full pack with replace- 
ment Wliat is the probability that both the extracted cards are of 
a specified suit ? 

Sol. : Tn the first draw, there are 52 cards in all, and 13 of 
them are of a specified suit. Hence the prob. that the first 

13 

extracted card wi lltelorg to a specified suit is — . In the second 

draw, the total no. of cards is again 52 since the first extracted 
card has been replaced and also the no. of cards of a specified suit 
is again 13, so the prob, that the second extracted card will belong 

13 

to the suit of the first extracted card is ^ ■ N° w the probability 
that both the extracted cards belong to a specified suit is 

!?x!-L!xL^L 

52 62 4 4 ' 16’ Ans. 

Exp. (4) : An urn contains 4 white and 5 black balls. 

(a) One ball is drawn at random; what is the probability 
that it is black ? 

(b) a second ball is drawn after the first(without replacement) 
and the colour of the first extracted ball was not noted. What is 
the probability that the second ball is black ? 

(c) What is the probability that the second extracted ball is 
black when the colour of the first ball was noted as black ? 

Sol. (a) We may imagine that the balls are numbered from 
1 to 10 in such a way that the white balls bear the numbers 1, 2, 
3, 4 and those of black 5, 6, 7, 8, 9 and 10. Extraction of one 
black ball means the selection of any one number out of 5, 6, 7, 8, 
9 and 10 and it may be in six ways. Thus the prob. that a randomly 

selected ball is b ack, wil .be rr=O*0 . 

10 Ant 

(b) The colour of the first extracted ball may be 

(i) White with prob. — . or (ii) Black wit h prob. 

(i) If the colour of the first ball is white and it is not replaced 
back, the prob. that the second extracted ball is black, will be 
0 

p . Hence the prob. that the first extracted ball is white and^ the 
second black when the first ball was not replaced, is — 


6 2 
10*9 “5 



4 _ 

15 
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4 (ii) If the colour of the first extracted ball is black and it is 
not replaced back, the prob. that the second extracted ball is 

black, will be Hence, the prob. that the first extracted ball is 

black and the second also black when the first ball was not 
replaced, is — 


Lxl— l 
10 9 5 



3 • 


r 


The desired prob. is the sum of the probs. obtained in (i) and 
(ii) i.e. 


4_ , l__4j-5 
lS^ 16 



An*. 


(c) When the first extracted ball is not replaced and is 
colour is black, the prob. that the second extracted ball will also 

be black, is— An*. 


Exp. (5) : A problem is given to 3 students whose chances 
of solving it are i,] and i. What is the probability that the problem 
will be solved ? 

Sol : If E denotes the event that the problem will be solved, 
then e will be the event that it will not b: solved. The problem 
will not be solved when each student fails to solve it. 

The prob. that it will not be solved by the first student 
is 1— J— J. Similarly the prob. that it will not be solved 

12 1 
by the second student is 1 — and tfiat of third is 1— 4 — 

_L. 

4 


The prob. that none will be able to solve the problem is— 

F<Ei444-i- 

He»ce. P (E)=l— P(TT) =1 ' r - i- A „ 


Exp. (6) : The probability that a worker selected at random 
from a certain factory is male is 0*6 and that of a worker is 
married is 0*7.Find the probability that a worker selected at random 
(1) is a married male, 

(ii) is a married female, and 
i (iii) a single temaie ? 

Sol. If A denotes the event that a randomly selected worker 
is a male then'A will be the event that the worker is a female. 
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Similarly, if B stands for a man leu worker, then b stands for a 


single worker. We have been giveu 

P(A)=Ott, l'(B)=07 
Since we know that, P(A)+P (“a )=l 


or 


P("a)= 1-P( a ) = 0 ' 4 


and 


P(B)+PfB) = l 


or 


Pv~tT ) =0*3 



(i) 

Hence, 




P(AB)=P(A) . P(B) 




= 0-tixO-7=U-42 

Ans. 


00 

pCa b)= p Ca )-P(B) 




—0*4- x 07 =0*28 ... ••• 

Ans. 


(•ii) 

P( A B)=P(X). P(~B) 




0 , 4x0'3=iM2 

Ans. 


Exp. (7) : In cotton F 2 segregating for leaf shape (narrow 
versus broad) and flower colour (yellow versus white), characters 
controlled by single gene pairs segregating independent!) and 
exhibiting dominance ol narrow leaf and yellow flower. What is 
the probability of a plant selected randomly from this F a possessing 

(i) na row leaves, 

(li) narrow leaves and yellow flowers, 

(iii broad leaves and white flowers (A/. Sc- A g. Agra. 1953 ) 
Sol. We have been given that — 

(i) the plants with narrow and broad leaves are in the ratio 
3:1, 

(ii) the plants with yellow and white fltJwer colour are in the 
ratio 3:1, and 

(iii) the two characters (Shape of leaf and flower colour ) are 
segregating independently. 


The prob. that a randomly selected plant from this Fj posse- 
ssing— 

(i) narrow leaves =* An*. 

(ii) narrow leaves 

and yellow flowers =P (narrow leaves) x P ( yellow 

(flowers) 


=i Ans. 


(iii) broad leaves 

and white flowers = P (broad leaves) xP (white flowbrs) 
= J x i = iV Ana. 
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4.4 Binomial law of Probability: 

If the probability of happening of an event in a single trial 
is *p' and that of failure is 'q\ then the probability that out of t n' 
independent trials performed under the same conditions, the 
event will happen V times and fail to happen (n—x) times, is 
given by — 

P(x)= n C x (p x ) ( q n ~* ), where p-\-q= land "C,= "J -, ' 

The possible outcomes of the above trials are *=0,1, 2 n 

Thus it is a set of (n fl ) events such that one of them is certain 
to happen, so the su n of their probabilities is unity i. e. 

P(0) + P(1H P(2)+ +/*(«)= 1. 

Exp. No. (8) : An unbiased coin is tossed 3 times. What is 
the probability that the head will occur, 

(i) 0 time, (ii) 1 time, 

(iii) 2 times and (iv) 3 times ? 

Sol. Here, n=3,p=q=l, if p (.v) d notes the prob. that ihe 
head turns up ‘.v’ times then 


(i) 

P (0)= 3C 0 (|)° (±) 3 
= lxlxj=i Ans. 

(ii) 

- P (l)=SC» (J) 1 (§)« 

=3x£xi=f Ans. 

(iii) 

. P(2) = 3C a a) 3 (i)i 

=3xixf=f Ans. 

and (iv) 

P (3) =3C 3 (*)*(§>“ 

= lxjxl = g *Ans. 


Exp. N®- 9) : F‘ ve aeroplanes fly together. What is the 
probability that all will reach safe to their destination ? When 
the probability of disturbance in any one of them is o 1 ? 

Sol. Here n=5, g=0-l, p=\— q—0'9. 

If p(x) denotes the prob that x plane3 will reach safe to their 
destination, then — 

P (5) =5C, (09') 5 (O l 5-5 ) 

= 1 X (0‘9) 5 x 1 
=0*59049 Ana. 

4.S Normal law of Probability* 

.. When the no. of trials increases indefinitely and neither , p' 
n^r 'q‘ is very small, thea the limiting form of the Binomial Law 
■s the normal law given by— 

/(*)- V(2 *npy) c ~ Ux ~ np)Vnpq .which on putting np=m 
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^ npq =*o> takes the general form— 

«)* * ^ X ~ m ^ , where m, a 2 are the mpn 

and variance of the variate x. Here x is a continuous variate which 
ranges from — oo to+oo. The frequency curve of the normal law 
is of the form- 



fig. No. (f) 

It is symmetrica) about the mean 'm' In fact tbie mean, median 
and the mode coincide in this case. The prob. that V will take its 
value between ‘ a ’ & ‘b’ is tye shaded area of the above curve. For 
the standard normal law (with mean zero and variance unity) such 
probabilities are given in 'Fisher & Yates Statistical tables'. Since 
the normal law is found to hold to a large mass of agricultural 
data,hence it is used very widely in connection with the agricultural 
experiments especially when we deal with the large data. 

Exp. No. ( 19 ) : What is probability ? For a character 
distributed {normally with unit variance what are .the probabilities 
of occurence of individuals exceeding the mean valfae "by 1 or 
more, ’2 or more, and 3 or more uuits ? 

(M. Sc. Ag. Agra, 1956) 

Sul. : For definition see theory. 



Fig No. (2) 
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( 1 ) Tbe prob. that the individual 'x' will exceed the mean rri 
by 1 or more is the shaded-area of the curve, which will be found 
out by normal tables i.e, P{(x— m)^l}=s0 1587. 

Similarly (ii) P{(x-m)5s2}=-0228, 
and (iii) P{(x— / m)>3}=0 00136. 

Pois son Law of Probability : 

If p or q is very small and np (=/w say) is a constant, then 
the limiting form of the Binomial Law is the Poisson Lqw, given by 


Here x denotes the no. of successes which is a discrete variate 
and ranges from o to oo. An example of the satne may be the no. 
of misprints per page in a book, or the no. of road accidents per 
day on a particular high way. It should be noted that the mean 
and the variance of this Poisson Variate x is tbe same t'.e.m. 

Exercise No. IV 

1. How will you define probability 7 Give its uses. 

2. What do you understand ’by binomial variate and normal 
variate 7 Describe the imp>rtant properties of normal 
probability curve and give its importance 

3. Give the equation fora normal curve whose mean is zero 

arid standard deviation is unity. What are the properties o f 
this curve. 1 * 

4. Give without jiroof the laws of probability. 

5. What is binomial law of probability 7 Using it, find the 
probability that all the five games will ‘be won by ^player 
A when the probability of tr issing a game is 01. 

(Ans p=*5S) 

6. What U tbe equation for a standard normal curve. Using 
it, find the total number of candidates out of 1000 who 
obtain more than 80% marks in an examination in which 
the average of marks is 50* and standard deviation is 10. 

(Ans 27) 

7. How will you defihe an! eveht. what do you mean by 
independent and mutually exclusive events 7 Give at least 
two examples for edch. 



Chapter V , 

Tests of Sanificance 

S.l Introduction. 

The aim of any statistical inquiry is to draw conclusions 
regarding the population characteristics (balled the parameters) 
like mean and standard deviation,' by studying the sample mean and 
sample s. d. etc. / l. arriving’ at generalizations from a study of 
particular cases. In doing this, we must know the -mathematical 
form' of the population to be studied and the technique of sampling. 
Fortunately, the normal distribution fits a large mass' of agricultural 
data. For the present discussion’, the sample wilt bdCbnsidered a 
simple random sample throughout’ this chapter /.‘e. the theory is 
based upon the assumption of normality of the parent population 
and simple random sampling. In drawing 'the conclusions 
regarding the population from the sample observations, we are 
faced with two types of problems. These are — 

(1) to estimate the unknown parameters of, the parent 
population by some suitable fuuction of the sample 
observations (statistics), 

(2) to compare these calculated values and to find how far the 
difference between the two values can be attributed to the 

fluctuations of simple rand >m sa npliag (chance). 

The former. problem iis called the problem pf'Estimation’ while the 
later as the problem of ‘testing of bypothesis.’A statistical hypothesis 
is a statement which specifies the value of one or more parameters 
or gives the relation between the two.-. or more parameters. Any 
statistical hypothesis under test is called a. null hypothesis and it is 
so set up that the difference between* the two values to be compared 
is due to chance alone. In .applying i a test , of significance, we 
calculate the probability of occurence a difference equal to or more 
thau the observed difference between the tvvd values to be Compared 
under the assumption that the null hypothesis is true. Id; advance 
of the experiment, we fix up the value of sqch a probability which 
is used as a line' of demarcation between ‘ the rejection aid the 
acceptance of the null hypothesis, called’ the level of significance. 
These. levpls of significance to test the hypotheses are generally *»1% 
ia^ac^e. If the calculated probability is* less than the 
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level of significance, the null hypothesis will be rejected otherwise 
it will be accepted. If the probability of getting a difference 
e^ua) to or niore than the observed difference between the two 
values to be cornered is less than *05, then the difference is said 
to be significant at 6 percent level of significance and when this 
probability is greater than or equal to .05, the difference is not 
significant at the 5 percent level of significance. At the 5 percent 
level of significance, a true hypothesis will be rejected 5 times in 
100 op the average in the long run. 

S*2 Sampling Distribution. If are take a number of samples 
from the specified population and calculate some statistic as mean 
.or the s. d., the obtained values will be a series of different values. 
These values under certain consideration may be grouped in the 
form of a frequency distribution. If the no. of samples drawn be 
larger and larger, the frequency distribution tends to a continuous 
distribution which very often is a normal distribution. This 
distribution is called the Sampling Distribution. The test of 
significance is carried out by calculating the probability of observing 
the difference between the two values to be compared. This value 
&h be fo&hd only when we have an idea of the sampling distribution 
Of the statistic to be used in testing the significance. We give below 
the sampling distributions of a fe<v 'Statistics with the standard- 
btfort'df 'the' differences between the two values to be compared. 

ojs/n 

where x is sample mean, /*->the population mean, 
o-i-the s.d. of the population and «->the no. of observations 
in the sample. 

S.‘E of(x—p )=o/V». 


( 1 ) 


follows a standard normal distribution. 


'( 4 , ,_<*=£> 
S/V* 


d.’f., 


follows the V distribution with (n— 1) 
ri<30, where V is the s. d. of the sample given by 
-j^y^-.wbich is an unbiased estimate of o. 

(Due to ‘Student’ W. S. Oosset) 

$) follows & itan<lard normal distribution when 
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distibution, Where x t told j 2 tot A: merfns ’of 1st and 2nd 
samples respectively of the sizes n lt n t and a*, o, a are tbe 
variances of the two patent populations. 


s.e. of (*,_*,)= 


(5) M») follows a ‘I* distribution with 

•Jfk ( - + -- 2)df ' 

where s* is the pooled estimate of population variance given by 

*- s J*i=ZZ±<*- *•»’ ,»h«« n,+», — 2 < 80 . 


S.E.ofW.-J.^S^L.+L)^ 


(Due to Prof. R A. Fisher) 
follows a standard normal 


V/l\ - (*i *>) 0*1 ^a) 

7(FD~ 

distribution, where s, 2 and s t * are the sample variances given by 

1 H,— 1 * Bj— 1 

S. E. of (x 1 —x g )= ^ ^ . where n t and n g are large. 


(7) F= — 2 (if s 1 *>« 1 2 ) follows ‘F’ dis ribution with 

s i 

v, = (/i,— 1) and v 2 =(w g — 1) d f. (Due to prof. G.W.Snedecor) 

(S) X^EfO— E)*/E follows a X 2 distribution with v. d. f., 
where v is the no. of cells whose frequencies can be 
determined independently Also, O and E are obsetved 
and hypothetical (expected) frequencies respectively 

(Due to K. Pearson) 

5.3 Degree* of freedom : 

The number of independent variables used in tbe computation 
of a statistic is called its degrees of freedom. If the total number 
of variables is n-and they are imposed under k restrictions then 
the degrees of freedom is (n-k). 

5 '4 Testing the significance of difference of two moans : 

It consists of dividing tbe difference of the two means hy the 
standard error of the diffeience and then finding the probability 
of observing this ratio. The computations -of the «. .e. stad 
the above mentioned . probability depend -upon .the following 
informations 
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(i) Whether the s.-d. of the parent population is known or 
unknown. 

(ii) Whether the sample sizes are small or large. 

The samples of sizes greater than 30 are considered as large 
samples. The theory will be discussed for the following situations;- 

(i) When 'o’ is known, whatever be the sample size (large 
or small). 

(ii) When ‘o’ is unknown and the sample is la/ge. 

(in) When ‘o’, is unknown and the sample is small. 

5 4.1. When V is known and Sample is' of any size: We shall 
discuss here two types of problems : — 

(a) To test whether the observed sample mean (x) is 
significantly different from a ' specified population (universe)-mean 
(n) i.ej. to test whether a given sample has bien taken from a 
population of specified mfean. 

(b) To test the significance of the difference viz. ( x x —x 2 ) 
between the two sample means 5cj and x 2 say. 

Case (a) : Let us suppose that x x , x 2 . . x„ constitute a 
simple random sample of size »u r from a normal population with 
mean p and standard deviation o. 

To test the null hypothesis (denoted by N. H. or H 0 ) ‘that 
the sample has. been taken from q specified population’ (to 
test the significance of the difference, x—p), we compute the 
statistic 



The sampling distribution of this statistic 'ts is normal with mean 
zero and variance unity. Hence the table of standard normal 
variate can be used to determine the probability of 
observing the value of z. If this probability is greater than 0*05 
the hypodiesis (Ho) will not b rejected at 5 percent level of 
significance The same can be done by computing z against 1 96. 
If z^I.96, 'the probability of ob'servihg z is less thdn or equal to 
' 0.05* leading fo the rejection of the mill hypotheris and if z <190, 
the probability of observing z “is greater thato 0 05 leading to tbe 
acceptance of the null hypothesis. In short, the test is carried 
out as follows *' 

.compvtez-. 1 ^! 

,1 If t ^T96;’rejecfthe N. n. 

butt if 7 < ! 196, there 1 is no evidence againrt the hyp. at ‘6'ty leVel 
of significance. 
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The above test is based upon the following assumptions:— 

(1) The parent population is normal. 

( 2 ) The sample is a staple random sample. 

(3) The s. d. of the population is known. 

Cate (b) : Let us suppose that x lt x v ...x nl is a s. r. s. of 

size from a normal population with mean & s' d. ^ and a 

second samplaof size n t is also a s.r.s. drawn from another normal 
population with mean & s.d. <v To test the hypothesis 
that the two samples have been taken f rom the two different 

populations with the same mean/.^i we Compute the 

statistic 


or z= 


Xi—x t | ■ 




If 2 >l 96, reject the N. H. 

but if z<l’96, there is no evidence against the hyp at 5 percent 
level of significance. 

If the test is to be carried ou f at 1 percent level of significance 
then z is compared against 2‘58 inUiead of 196. 

The assumptions involved in the above test are: — 

(1) The two parent populations are normal. 

(2) The samples are simple random samples. 

(3) The two samples aie independent. •_ 

(4) The s. ds. of the two populations are known. 

Exp. (1) : [a] A random ample of 25 items is drawn from a 
normal population with mean 516 and variance 4 0. [f the sample 
mean is 6 66; can the sample be regardeJ as drawn from the 
specified population ? 

Tb] A simple random sample of 9 is drawn from a normal 
population with mean ^ (unknown) and s.d. 4.5. Another simple 
random sample of size 10 is drawn fro n a normal population with 
mean p, (unknown) and s.d. 4 66. Test, whether the means of 
the two populations are equal ? Given that sample means are : 68*0 
and 69*2 respectively. , • , 

Sol. (a) : H«: —The sample has been taken fro u the specified 
population i.e: /t=5*l5 & o a =4. 

Now we compute the statistic, 

Zwm ~\ X (M | 

a/^/n 

_ | 6 66-5*16| _ 160 _7*5 

2/V25 2 “T‘ 

.*. z— 3*75 /.e.>I*96. 
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Thus we reject the N. H. at 6 peroent level, 

Conclusion : Therefore, we conclude that the sample ha? not 
been drawn from the specified population, 

[b] H 0 :—/»!=/*». 

_JjyrTX i j_ 

7(? + 2f) 

68*0-69*2 I 


Here we compute the statistic, z- 


or 


f(6(*W , (4 66)*) 


1*20 


1,20 


V ("~5" 


f i-J V(4*42) 21 

.*. 2 =0*67 /. e. < 1*96. 

Thus we have no evidence against the hyp at $ % l ev ®l* 
Result : Therefore, we conclude that the two samples have 
been drawn from the two different population? with th$ same mean. 

Exp. (2) : The means of simple simple* of 1000 and 2000 are 
67*60 and 68*0 inches respectively. Can the samples be regarded 
as drawn from the same population of standard deviation 2*26 
inches ? (M . Sc. Agra, 1954) 

Sol. Ho : The samples have been taken from the same universe 
Hi=H a and <r, =00=2*2.5. 


Now we compute the statistic 
I Xf-%a I 

f/T l \ (Slnce 

} 67*60-68*0 1 


:<x,=e 


is known) 


060 




•097 


=*5154. 


,iooo 2000 ; 

So z>l*96 and we reject the N.H. at 6% level. 

CoucIbsmb : We therefore conclude that the sampler have not 
been drawn from the same {population. 

8.4.2 When it lb* unknown pod sample is large : 

Caae(n) : If Hi? desired to test the hyp.— .whether p given 
sample has bear taken from a population of a pewfied mean {/*>; we 
compute the statistic 
I X-y. 1 


z— 


s f yfn * w ** ere *•»' is the unbiased estimate of^givq* 


by r*= 


(*— *)* . 
n — 1 


If z>l*96„ reject the N H. 

bid if z<l'9$,-accept the N.H. at 5 percept-level of significance. 
The assumptions involved in the above test, are— 



Tests of Significance 


17 


(1) The parent population is normal. 

(2) The sample is a s. r. s. 

(3) The s. d. of the population is unknown. 

(4) The size of the sample is large. 

Case (b) : If we want to test the hyp.— that the two samples 
have been taken fiom two different populations with the same mean 
i. e. n! = fi 2 , we compute the statistic 
I y - x I 

z= ~ /ax - .where s x & s 2 are the two unbiased estimates 

-y ^ ) of the population s. ds. a u <r a respectively. 


If z > 1 *96, reject the N. H. 

but if z < 1 *96, accept the N.H. at 5 percent level of significance. 

The assumptions of the test are— 

(1) The parent p'pulations are normal. 

(2) The two simple random samples are independent. 

(3) The s. ds. of the populations are unknown. 

(4) The sample sizes are large. 

Exp. (3) : A rope manufacturer adopts a new process if ihe 
mean breaking strengih of the 400 ropes was found to be 12ft lbs. 
with its s. e. as 10’5. The mean breaking strength of the ropes 
manufactured by the old process was 105 lbs. Is. the new process 
superior to the old one ? 

Sol. H 0 : The new process of manufacturing the ropes is not 
superior to the old one i.e. /n=105 Lbs. 

We are given— *=126 Lbs., s.e. of *=s/VrT =10’5 Lbs. 


105 Lbs., and n=*400 (large). 
Thus we compute the statistic, 

I x-ji | | 12-1051 _ 20 


z— 


=1-904. 


s/y/n 10-5 10 5 

So z<l’96 and we have no evidence against the hyp. at 5% 
level of significance. 

Conclusion : We, therefore, couclude that the new process is 
not superior to the old one. 

Exp, (.4) A sample of heights of 6400 soldiers has a mean 
of 67-85 inches and a s. d. of 2-56 iocbes while a simple sample 
of heights of 1600 sailors has a mem of 68 55 inches and a standard 
deviation of 2*52 inches. Do the data indicate that the sailote are 
on the average taller than soldiers ? (B. Sc. Agra, h^5) 

Sol. Ho : The ors are not on the average taller thin 
soldiers i.e. pr=^2. 

Here we compute the statistic, a 
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*!-*» I - 


67-85-68 65 


«J A/ ( 64U0 + 1600 ) 


So z>l*96 and the hyp. is rejected at 6 percent level of significance. 

Result : Hence the data indicate that the sailors are on the 
average taller than the soldiers. 

Exp. (S) : The mean staple length determined by taking 100 
samples from each of two lots of cotton was as follows-*-* 
lot A : -J5*4±0-8 m. m. 
lot B : 268 ±0-7 m m. 

Do the lots differ significantly in their mean staple lengths ? 

(A/. Sc. Ag. Agra, 1958) 
Sol. Ho : The mean staple lengths do not differ 
significantly i. e. Hi=n a . " 

Given that x 1 =25-4 m. m., s. e. of (Xi) = 0 8m.ni. 

jc 2 =26 - 8 m m., s. e. of (Jc a )=0*7 m. m. 

Since the samples are large, so we compute the statistic 

_ = I — *> 1 = !*»-*»! 

s.e. 0 ^— x t ) Vl( s - e °f *i) 2 +(s e - of *s) 2 ] 
because s.e .of(x x — x a ) = V[s e. of x,) 2 +(s.e. of *,)*]. 
ftr 7 _ | 25 4 — 26 8 | _14 _ 

VU^f + CO 7J 2 } 100 

So z<l‘96 and the hyp. is accepted at 6 percent level of significance. 

Result : Therefore, we arrive at the conclusion that the mean 
staple-lengths do not differ significantly. 

5.4.3 When 'a' is unknown and sample is small : 

Case (a) : To test the hyp. — that the sample has been taken 
from a normal population with specified mean Q0 and unknown 
s. d.; we compute the statistic ‘Student-t’ as 

,= AlfiA , where 

S |V« »— 1 

If t ^/.o s (n— 1), reject the hypothesis, 
but if t <t 0 f(n— 1), there is no evidence against the hyp. at 6% 
level. Here /.» 6 (n — 1) stands for the tabulated value of *1* at 6 
percent level of significance for (n— 1) d. /. 

l sumptions : 

. (1) The parent universe is normal. 

. (2) The sample is' a simple random. 

(8) The s. d. of the population is unknown. 

(4) The size of the sample is small. 
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Cat* (b) : If we want to test the hyp.— that the two samples 
have been drawn from the same population, we compute the 

statistic ‘Fisher t’ as — , where s is the pooled 


J(k + k) 


estimate of the population s. d. given — *i )+**(*» 

If t ^t.o s (n 1 +n 2 —2), reject the N. H. 
but if t <r-o 6 (« 1 +n a — 2). there i> no evidence against the hyp. at 
5% level of significance. 

The test has the following assumptions— 

(1) The populations are normal. 

(2) The samples are simple random and independent. 

(-) The s. d. of the two populations are the same but unknown. 

(4) The sizes of the samples are small. 

Case (c) : We have discussed above in case (b) that “Fisher-/ 
test” is used only when the two samples are independent. But if 
the two samples give an indication of (+)ve correlation t.e. the 
observations are paired, then the test is carried out by computing 
the statistic “Paired t” as 


t— — , where 'd' stands for the difference between the paired 
s/Vn 

■ 

values (jc y) i.e. d=(x—y)say,~J='^d/n and s 2 ='^(d—J’) t /(n—l). 

If t ^ f oi («— l). reject the N. H. 

but if t <t og («— 1;, there is no evidence against the hyp. at 5% 
level of significance. 

Assumptions : 

(1) The parent population is a bivariate normal. 

(2) The two samples show (+)ve correlation. 

(3) The s. ds of x and y are unknown. 

(4) The samples are small. 

Case (d) : In the cases <b) & (c) discussed above for two 
sample problems, we have noted that the population- 
variances were unknown but they were suppo<ed to be the same. 
Thus for small samples there may arise one more situation where 
the population-variance being unknown may not be the same. In 
such a situation for two small and independent samples ‘ Fisher’ s- 
t-test' cannot be applied. The exact test for this purport was first 
developed by Behrens and leter on by Fisher, and so >^is called 
' Fisher •Behr ens- latest’ . The tables for the applications oi this test 
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are prepared by Dr. P. V. Sukhatme. But in using these tables, the 
interpolation is used a no. of times and so this test is a compli- 
cated one ? 

In practice, we use an app-oxiiratc test namely ‘Cochran and 
Cox-t-test’ which is usually carried out afier justifying the fact 
that o x *^ct, 2 . Thus in testing the hyp. ju 1 =/u 2 , first we shou’d test 
the hyp. <Tj 2 =o 8 2 through he F- est. If F-test gives no evidence 
against the hyp. (say at 6%), then we should apply Fisher’ s-t-test 
otherwise Cochran and Cox-t-test. For Cochran anS Cox-t-test, 


wc compute the statistic t= 


X,— Xo 


ti£i 2 ijS t 2 

Wl ' M 


v^+ s q 

V»i »*/ 


and con pare it g ainst 


•LL + s ± 


^2 — . Here t' is a weighted mean of the tabulated 


n \ n 2 

values ti=t 05 (n x — 1) and h=t. o 5 (n a -l) with weights as s^\n x and 
j 2 */n 2 repectively, where s^, s<? have their usual meanings. 

If t > F, we reject the N.H. at 5% level ; 
but if t < t', there is no evidence against the hyp. at 5% level. 
The assumptions involved in the above test are— 


(1) The samples are small. 

(2) The two samples are simple random and independent. 

(3) The parent populations are normal. 

(4) The s.ds. of the populations are unknown and different. 
E*p. 6. (a) A certain drug was administered to each of 

the 13 patients and it resulted in the gain of sleeping hours as 
follows : 

— 4 , *, 2 , 8 , - 1 , 3 , 0 , 6 — 3 , 1 , 5 , 0 , 4 . 

Can -t be concluded that the drug will in general be 
accompanie ’ by an increase in the sleeping hours ? 

(b) Te. : ndividuals are chosen at random from a population 
and their heigats are found as follows : 

03, 63, 00, 67, 68, 69, 70, 70, 71, 71 inches respectively. 
Test whether the mean height is 69*6' in the population. 
Givennhat for 9 d. f. P (1/ j ^2 262}=0 05, 

(M. Sc. Agra, 1962 ) 

( Sol. (a) Ho : The drug does not have any increase ih the 
sleeping hours of the patients i. e. p=0. 

Since the sample is small and the s.d. of the population is 
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unknown, so we compute the statistic t= ■ — ■ . The comput- 
ations for x and s are shown in the following table. 



Gain 
x hrs. 

A' 2 

Computations 

1 

—4 

16 

Sjc 26 

2 

5 

26 

x — — ——=2 hrs. 

71 13 

3 

2 

4 

and 

4 

8 

64 

* 2(a— A) 2 

5 

— 1 

i 

n — 1 

6 

3 

9 

2a 2 — (2a) 2 / 

71 — 1 

7 1 

0 

0 

206 — (26) 2 /13 

8 1 

6 

36 

” 12 

9 

-3 

9 

145 

10 

1 

1 

— 12 

= 12 8333 

ii ! 

i 

5 

25 

.-.s=V( 12*8333) 

12 

0 

0 

= 3 68 lire. 

13 

4 

16 


71 = 13 

2a=26 

2 a 2 =20 6 



Thus we have 

, I 2-0 I 2 o Hi 

3 58/Vli ~ 3-58/3-6 “ 
or r=2-01 and t. os (12)=2179. 

So t<t . 05 (12) leading to the acceptance of N. H. at 6 percent. 

Conclusion : Since the observed value off is less than the 
tabulated value of I at 5 percent for 12 d. f , we conclude that 
the drug does not have any increase in the sleeping hours of the 
patients. 

(b). Bo : The sample has been taken from the opulation 
of specified mean : ^—69-6'. 

Here the sample is smalljand the s. d. of the populating is 
unknown, so we shall compute the statistic / — 1 -* — - ' 
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For the computations of* and s, we prepare the following table— 


Obs. Height 
No. *' 

Dev. „ 

5=i(x— A) * 

Computations 


A =67 =(*-A)* 




3 

66 

—1 

4 

67 

0 

5 

68 

+ 1 

6 

69 

+2 

7 

70 

+ 3 

8 

70 

+3 

9 

71 

+4 

10 

71 

+4 




=1 89. 


TL , ‘ | B7-8- 69-61 _ D8 8 q 

Thus t- } 0 i /v /jq 3 01/3-06 1 

ort-1-89 and t. 06 (9)=2 262. 

So f<r. 05 (9) leading to the acceptance of hyp. at 6% level. 

Conclusion : As the observed value of t u less than the value 
of t from the table at 5 percent level for 9 d. f ; we conclude that 
the sample is taken from the population of specified mean. 

E*p. 7. (a) Explain jthe utility of student’s small sample 
theory in biological research. 

(b) In a field-experiment with two varieties of wheat grown 
in seven pairs of plots, the following yields were obtained— 


Variety 


Yield in mds./acre 


15 

14 

12 

15 1 

16 

11 

13 

11 

11 

13 

12 

13 

10 

12 
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From the above data, compute the mean difference, standard 
deviation of the difference and */’ ? What is a reasonable inference 
about the.population mean diffe-ence ? (M. Sc. Ag; Agra 1964) 

Sol. (a) In biological research, the research worker is very 
often faced with the problem of testing : 

(1) Whether the observed sample mean is significantly 
different fiom some specified value ; e. g. if we are given 
the additional bouts of sleep gained by 10 patients who 
were administered a certain drug and we want to know 
whether the drug is helpful in producing the additional 
sleep. This problem is simply to test whether the mean 
of additional hours of a'eep is significantly different 
from zero. If this difference is significant then we say that 
the drug is helpful in producing the additional sleep. 

(ii) Whether the difference between the two sample means is 
significant; e.g. if we are given the mean no. of bacteria in 
colonies/plate obtainable by four slightly different 
methods from soil samples taken at 4 P.M. and 8 P.M. 
respectively and want to know whether the no. of 
bacteria at 8 P.M is more than at 4 P.M., then we 
shall have to test the significance of the difference 
between the mean no. of bacterias at 4 P.M. and 8 P.M- 
(iii) Whether the correlation and regression coefficients are 
significantly different from some specified values. 

The limitations of the biol ogical data ate as follows — 

(1) Their sizes are small. , 

(2) The population s. d. (<*) is unknown. 

The discovery of V distribution gave the exact tests for the 
problems of type (i), (ii) and (iii) mentioned above in the case 
of small samples. Before its discovery, the test applied to the 
above problems in the case of small data were the same as those 
for the large data which were not statistically valid. Thus the 
discovery of distribution supplied an exact test applicable to 
small as well as to large samples and so introduced the modern era 
of statistics. 
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(6) Here we apply the Paired-t-test and the computations 
for it are made from the following table 


Pair 

No. 

Yield for 
A 

Yield for 
B 

Difference 
d=( A- B) 

d 2 =( A-B) 2 

Computations 

1 

15 

11 

+ 4 

16 

T-™" 2 

n r 7 

2 

14 

11 

+ 3 

9 

and 

.,_S( d-d)* 

3 

12 

13 

- 1 

1 

n — 1 

S </»-(Ey)*'n 

4 

15 

12 

+3 

9 

— n — 1 

6 

16 

13 

+ 3 

9 

46 —(14 ) 2 /7 

6 

6 

11 

10 

+ 1 

1 

= 1H 

7 

13 

12 

+ 1 

i 

1 

=3 

/.5=V 3 
= 1-732 






1 

n=7 

— 

— 

2r/= 14 

M 

II 

Ob 



H 0 : The two varieties of wheat are not significantly different 
from each other as regards their average yields/plot i.e. 


Since the two series are correlated series and the observations 
are paired, so we compute t ie paired-f-statistic as 
t __ | mean difference | 

~ a.e. of difference 

|d | _ I 2 1 2 2 — dg 

=* 1 732/^7 ~ 1 732/2 646 “ 6545 ' 

or /= 3*06 and / 0s (6)=2 447. 

So <>f- 05 (®) leading to the rejection of N. H. at * percent. 

Result : Mean difference =2-0 md./acre, 

s e. of difference =0*6645 md./acre, and f =3*08. 

The reasonable inference about the population means is that 
they are hot equal. Since the calculated value of 7’ exceeds the 
tabulated value of 7’ at 6 percent level for 6 d.f., so the two varieties 
of wheat are significantly different as regtrds their average yields/plot. 

Eap. ( 8 ) For a random sample of 10 pigs fed on a diet A, 
the increases in weights in a certain period were ' 

10, 6, 16, 17, 13, 12, 8, 14, 16, 9 lbs. 

For another random sample of 12 pigs fed on a diet B, the 
sases in weights in the same period were : 

7, 13, 21, 15. 12, 14, 18, 8, 21, 23, 10, 17 lbs. 
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Find, if the two samples are significantly different regarding 
the effects of diets. Given that for d.f. (v)=20, 22, 24 the fj percent 
values of i are respectively 2*09, 2-07, 2 ‘06 . 

(Af. Sc. Agra, 1963 ) 

So!- H 0 : The effects of the two diets on the average do not 
differ significantly i.e. /x 1 =/* a . 

Since the samples are independent, small and the s.ds. of 
the populations are unknown, so we compute the statistic 
* Fisher’ s-t' as 



where t i = 


X])* -1 £(x a — x a ) 2 

"l+ni — 2 


The sample-means ‘jc 4 , xf and the pooled estimate of popultion- 
variance s 2 ’ may be computed in the following tabular form. 


{-Sample 

11-Sample 


o 

% 

H 

IK 

tt 

IK 




ik ^ 

5^ 

ik 

Computations 

C/5 

X 

O 

C/5 

JO 

o 

i l 

HIX 

1 

£ 




1 II 

0« <M 

kik 

i 

M 

1 

2 

10 

6 

—2 

-6 

4 

36 


1 

2 

7 

13 

-8 

-2 

64 

4 

_ 2*! 120 , OT , 
;c *-if-ir= l2Lbs - 

3 

16 

+4 

16 


3 

22 

1-7 

49 

Lbs. 

4 

17 

+5 

25 


4 

15 

0 

0 

1 n, 12 

and 

5 

13 

+ 1 

1 


5 

12 

-3 

9 

s t _ s (*i-*i) a + s (*«— **)* 

6 

12 

0 

0 


6 

14 

-1 

1 

»i+w a — 2 

7 

8 

—4 

16 


7 

18 

+3 

9 

. 120+314 






8 

8 

49 

1 

H 

+ 

o 

•H 

8 

14 

+2 

4 


—7 

- 434/20 | 

9 

15 

+ 

9 


9 

21 

+6 

36 

= 21-7 Lbs. 

10 

9 

—3 

9 

10 

23 

+8 

64 

.’.i=V(21‘7) 





11 


-6 

25 

=4 65 Lbs. 





12 

17 

+2 

4 





o 















pH 

00 

II 



£ 


H 

ii 



o 

00 

I 


o 

H 

II 

H 

s 

pH 

II 

£ 

1 

1 

et 

3 

i 

£ 

0* 

11 

M 

c 

pH 

II 

M 

W 

5? 

i 

H 





w 




■ 
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Thus we have 

. | 12-15 1 3 3 

1 4 '66 Vtff+A) “ 4'65 X '43 ” 19995 ’ 

or t=l 604 and t. 05 (20)=209. 

So t < t. 0B (20), leading to the acceptance of H 0 at 6% level . 

Conclusion: Since the observed value oft is less than the 
tabulated value of ‘t’ at 6% level for 20 d f., so there js no evidence 
against the hyp. at 6% It clearly means that the two diets do 
not differ significantly as regards the average increase in the weights 
of the pigs. 

Exp. (9) : Two operators ‘A & B’ carried out simultaneous 
measurements of the percentage of ammonia in a plant-gas on nine 
successive days It is required to know whether the two differed in 
their average results under ihe following situations— 

(a) Samples were taken and divided equally between the two 
operators for test. 

(b) The two operators worked on independent samples. 

A: 4, 37, 35, 43, 34, 36, 48, 33, 33 

B : 13, 37, 38, 30, 47, 4g, 57, 28, 42 (M A. Patna 1 955) 

Sol. (a). H 0 : The two operators do not differ signi ficantly in 
their average results i.e. /xj=/x 2 . 

Here the observations are paired and we suspect a (+)ve 
correlation between the observations of the two operators. 
Further the samples are small and the s.ds. of the populations 
are unknown. Thus we compute here the- statistic ' Paired-t ’ as 


1 d 

s!y/n ’ 


where d=difference of paired values, 


d= 


*d 


n ’ 
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The mean 'd' and the variance ‘s 2 ' of the differences pf the 
paired values may be computed in the following tabular form. 


Pair 

No. 

f-Satrfple 

Obs.*! 

II-Sample 
Obs. x-t 

Difference 

d=x x —x 2 

d 2 

Computations 

1 

4 

18 

-14 

196 

~d= — =~=^ 533 

n 9 

2 

37 

37 

0 

0 

and 

3 

35 

38 

— 3 

9 

, 2= S(d-5j 2 

4 

43 

36 

7 

49 

n — * 


34 

-13 

169 

2d l -(hay in 

5 

47 

~ n- 1 

6 

36 

48 

—12 

144 

754 — (— 43) 2 /9 

7 

48 

57 

- 9 

81 

8 

8 

33 

28 

5 

25 

498 16 
“ 8 
=62-27 

9 

33 

. 42 

- 9 

81 



. 

I 

Zd 2 = 

754 

,\s=V( 62 ‘ 27 ) 

n=9 

— 

— 

s^- 

1 

oo 

-7-8 


Thus we have 


1-5-33 1 5-33 5-33 

7-8/V9 ~ 7-8/3 “ 2‘6 
or t=2-06 and t.o 5 (8)=2*306. 

So t < t. 05 (8), leading to the acceptance of H 0 at 5% level. 

Conclusion : The two operators do not differ significantly in 
their average results. 

(b) Ho : The two operators do not differ significantly in their 


average results /.e. 

Here the samples dealt by the two operators are independent, 
small and the s.ds. of the populations are unknown. Thus we 
compute the statistic 'Fisher' s-t’ as 




t= * l . f al . . , where s 2 = 




— ^a) a 


,v (;r + £) 


2 
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The sample-means '*!, Xt and the pooled estimate of popul- 
ation variance *s v may be computed in the following tabular-form. 


I -Sample 

H-Ssmpje 

1 







/*"S 


I 

i 

X 

■'O 

> IS 


6 

Z 

X 

N 

> I 


Computations 


CO 

Siii 


• 

03 

X) 

QviS 1 

V 


0 

o 

ii ^ 

i 

0 

0 

II ** 

S' 



i 

4 

-30 

900 

1 

18 

-21 

441 

Let 4j=34, y4 a =39. 

2 

37 

3 

9 

2 

37 

2 

4 

*,=...+4- 

= 34+ J^ 3 !- =33*67, 

3 

35 

1 

1 

3 

38 

- 1 

1 

4 

43 

9 

81 

4 

36 

- 3 

9 

W 2 

=39+0=39, 

5 

34 

0 

0 

5 

47 

8 

64 


36 







and 

6 

2 

4 

6 

48 

9 

81 

jS _ Sp-(SQV«i+^-(W». 

7 

48 

14 

196 

7 

57 

18 

324 

/Jj + Tlj— 2 

1 193— (-3) 2 /9+ 1054-0 

8 

33 

- 1 

1 

8 

28 

-11 

121 

9+9-2 


1193—1 + 1054 

9 

33 

— 1 

i 

9 

42, 

3 

9 

“ 16 








•2246 . 


, 






1 

" 16 



cn 

CO 


■ 


3 

© 

= 140376 

l 


1 

n 

uj* 

W 

rH 


1 

© 

.*. a = VO 40*375) 

1 

pH 

1 

Sj* 

w 



II 

p 

W 

1 

* p 

W 

= 11-86 


Thus we have, 

1 33-67—39 1 5 33 5*33 

“ ll*85V(i4-i) “ 11*85 X *47“ P&T 

or t=0*95 and t.o 6 (16)— 2*12. 

So 4 x < t. 0B (16). leading to the acceptance of Ho at 5% level. 

Conclusion : The two operators do not differ significantly as 
tegards their average measurements. 
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Exp. (10) : The mean of 12 observations is 6* $85 with a s.e. 
of 0'0094. The mean of 20 observations by a different method 
is 5*855 with a s.e. of 0*0038. Are these means significantly 
different ?' 

Sol. Ho : The two means do not differ significantly /.e. /*!=/*»• 
Given that : 

n 2 = 12, /> s =20, ^=6*885, x a =5*865, 
s.e. of (x 1 )==-~ T =0*0094, 


and s. e. of (x 2 )= 


V(«j) 


=0*0038. 


Here the samples are small, independent and the s.ds. of the 
populations are unknown. Thus we shall use either Fisher’s-t or 
Cochran and Cox-t-test for the purpose. 

In order to decide which of these two should be applied here, 
we shall first test the equality of variances (<t 1 2 =(t j 2 ) by F»test. 
Thus we compute, 


F= -ii. 

1 . 9 •*““ 


12x(0.0094) 2 26508 


722U 


■3.67 


s ,* 20x(0*0038) 2 

or F=3'67 and F.^ (u, i a ) = 2 67. 

.*. F > F, 05 (11, 19), leading to a^a^. Hence we should 
compute here the Cochran and Cox-t statistic, given by 

, I \ 




5*885-5*8651 


*03 


V(0*00008836 + 0*1000 1 444) 


_;03 
V( ’0001028)“ *01 


Now we shall calculate t / as 

c 2 c 2 


g a » a 
!L+tL 
«i n 2 


where r l =/.o 5 (ll)=2.201 
andr a =f 06 (19)=2*(93. 


2*201 x 0*00008836+ 2*093 x 0*00001444 
0*00008836+0*0000144"" 

*0002247 _ 2247 

1028 


*0001028 
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or #'=218. 

Now we see that t > t 1 , hence it leads to the rejection of 
N. H. at 5 percent. 

Conclusion : The two sample-means differ significantly at 
5 percent level of significance. 

5*5 Testing the equality of two variances : 

The statistic Snedecor’s-F is used to test the equality of two 
variances /.e. = Tne test consists of computing the ratio 

F=^ (provided s j 2 > j 2 2 ). 

S 2 

IfF > F.oj K -1, w 2 — 1), reject the N. H. at 5%, 
butifF< F . 05 (n,— 1, h 2 — 1), accept the N. H. at 5% level. 

HereF.psfrt! -I, n 2 -1) is the tabulated value of ‘F’ at 5 
percent level for (.*», -1) and (« 2 — 1) d.f. 

This test is based on the following assumptions 

(1) The parent populations are normal. 

(2) The samples are simple random and independent. 

F.«p. (11). Two random samples drawn from two norma) 
populations are — 

I : 20 10 26 27 23 22 18 24 26 19 

II : 27 33 42 • 35 32 34 29 41 43 30 37. 

Obtain the estimates of the variances of the populations and 
test whether the two populations have the same variances 7 

(I. A. S. 1955) 

Sol. (Ho) : The two samples have been taken from the 
populations of equal variances l.e <j, 2 =<t 2 *. 
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Here we compute the statistic F=s! 2 /s 2 2 (provided s 1 2 >s 2 *), 
where Sj 2 , s 2 2 are the estimates of the variances of I, II populations 



So F<F.o 6 (11 ,9) leading to the acceptance of H 0 at 5% level. 

Conclusion : Since the calculated value of F is less than the 
tabulated value of F at 5% level for 11 and 9 d. so there \j no 
evidence against the hyp. at 5%. Therefore, we conclude that tfhe 
samples have been taken from the populations of equal variance^. 

Esp. (12). Show, how you would use the student’s t-test and 
fisher's z-test to decide whether the two sets of observations— 




Obs. No 
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17 27 18 25 27 29 27 23 17 

and 16 16 20 16 20 17 15 21 indicate samples 

drawn from the same universe 7 ( M . Sc. Agra, 1949 ) 

Sol. (Ho) : The two samples have been taken from the same 
universe i. e. a^—af and 

Here we first compute the statistic Fiaher’r-z=i log, F for 
testing the hyp. Hoi : <J 1 s,=a iV If it leades to the acceptance of 
Hoi at 5% say, then only we need to proce. or student’s~t for 
testing the hyp. The computations for Fisher’s-z and 

Student’s-t-tests ar. made from the following table. 


I-Sample U-Sample 



0 | 1 16 —4 

100 216 —4 

1 320- 0 

64 416 —4 

LOO 620 0 

144 617 -3 

100 7 15 —5 

36 8 2 1 j 1 



... Now we get 

10 57 

16 Xi-A-f- -’=17+^=23-33, 

"i y 

xe *7= J +— - 1763 

0 Also, 

9 h = — np\ i Bl -1 

25 [ _545-(57)*/9_134 

1 8 “'8 ’ 


in to oo 


—i s 

C 4 * «j— 1 « 2 ~1 

? 83 -(-19)78 _37-875 

0 7 =' 42 


to test the hyp. H 0 i : 
=1 log. F=2-3026 x| 

•>1*1513 log 10 5^=1 


CT 1 *=>o J , , we compute Fisher’»-z as 

log^ F*»ri513 log,o |^(if »i*>s» J ). 

*» 

15l3 1og!o 4*25 l£ l* 1 5l®Xl6284 = -72. 
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or z=0‘72 and z o 5 (8,7)=0 - 96. 

So z<z. 05 (8,7) leading to the acceptance of H„i at 5% level. 

We,' therefore, conclude that the samples have been taken 
from the populations of equal variances. 

Now we need to test the hyp. H 02 : and find ourselves 

in the position to proceed for the computation of Stud ent’s-t as 
follows. 




I *!-*« I 

V(.T*IV 


| 23 33— 17*63 I 


3-84.1 

7L+U 

V 

\9 8 ) 


where s 2 = 

«!-hn 2 -2 

184-f-37‘875 221*875 

9+8-2 15 

= 14-7917, .\s=3-84 


6-7 

3*84 X -48 


5-7 

1-84 


=3-16. 


or r=3-16 ard r. 05 (15)=2 - 13. 

So r>/. 05 (15) leading to the rejection of Ho 2 at 5% level. 

Here we conclude that the samples have not been taken from 
the populations of equal means. 

Conclusion : Therefore, we arrive finally at the result that 
the samples have not been taken from the same universe, 

5-6 X 2 -Test : 

If the observed data is classified in the form of a frequency 
distribution and we want to test whether the •observed data is in 
agreement with certain theory or hyptthesis, we compute the 
X 2 — test as : X 2 =2(0 — E) 2 /E, where O, E stand for observed and 
expected or theoretical frequencies respectively 

The sampling distribution of this X 2 — statistic was first obtained 
by Helmert in 1815 and later on it was derived independently by 
prof. K. Pearson in 1900. The d.f. associated with ^-distribution 
is the no of cells or classes whose hypothetical frequencies can be 
determined independently. If the total no. of classes be n and k 
linear restrictions are imposed on them, the d.f. for X 2 is taken as 
(i n—k ). In short, the test is carried out as follows. 

First we calculate the expected frequencies on the basis of 
the null hypothesis (H 0 ) and then compute the X 2 statistic as— 
X*=S(0— E)*/E. If X*>X 1 . 05 (n—k), we reject the hyp. at 5 percent, 
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but if X‘ f <Z 2 .oj (n— k), there is no evidence against the hyp. at 5% 
level of significance. 

Assumptions: — The above test is based upon the following 
assumptions or conditions — 

(1) The sample is large i.e. 20^50. 

(2) The observations are independent. 

(3) The constraints if any, are always linear. 

(4) None of the classes contains expected frequency <5. Some 
statisticians take this no. as 10 also. 

In case the expected frequency of any cell is less than 5, 
regrouping is carried on with the neighbouring classes to make the 
frequencies >5. 

Properties of X 2 : 

(1) If X 2 =o, there is a perfect agreement between the theory 
and practice. But as the value of X 2 deviates from zero, it 
indicates a departure from this agreement. The greater the 
departure the larger is the value of X 2 

(2) For large d. f. (v^30), X a -tends to normality. 

(3 ) Additive Property : If X 2 1( X a 2 ,"' ,X 2 fc are all distributed 

independently like X 2 -distribution with v„ v 2t ,v 4 , 

respective d.f. then the sum : X 2 ,+X 2 2 -f- +X 2 * will also 

be distributed like X 2 with d. f. v=v 1 -fv 2 + +v». 

(4) Partitioning Property : Any X*-variate with v d. f. can 
be split up into its independent component X 2 -variates 
such that 

X 2 (v)=X 2 (v I )+X 2 (v i! )+ +X 2 (v fc ), where v=v 1 +v a + — +v*. 

Uses of X 2 -test : 

(1) It is used to test whether a given sample has been taken 
from a population of specified variance. 

(2) It is used to test the homogeneity of several variances 
through the Bartlett-test. 

(3) It is used as a test of goodness of fit for testing whether a 
given sample has been taken from a specified population 
or distribution. 

(4) It is used to test the association of attributes. 

(5) It is used to test the homogeneity of several observed 
correlation coefficients. 

1(6) It is extensively used in the analysis of Genetical - 
Experiments especially to test the genetical hypotheses 
and to detect the linkage. 
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5.6.1. Comparison of sample-variance with the population- 
variance : 

If we want to test the hyp . — that the sample has been taken 
from a population of specified variance (a 2 ), we compute the statistic 
X 2 =2(;c-jc) 2 /a 2 . 


It follows a ^-distribution with v=(n— 1) d.f., where ‘ x ' 
stands for he sample-observations, 'x' for the sample-mean and 
‘2’ for the summation over all values of the sample. If the 
population-mean >’ is known in a problem, the proper statisti; to 
be computed there for the purpose is X 2 =2(;c -p) 2 /<7 2 which follows 
a X 2 -distribution with v =n d.f. 

If X 2 ^X 2 .o 5 (v) wc reject the hyp. at 5% level, 
but if X 2 <X*.o 3 ^v), there is no evidence against the hyp. at 5% 
level of significance. 

Exp. (13) For a sample of 10, the sum of squares of the 
deviations from the sample mean is 165 65. Test whether the 
sample has been taken from a normal population with s.d. ‘cr = V ? 

Sol. (H») : The sample has been taken from the population of 
specified variance : a 2 — 9. 

Here we compute the statistic 

X 2 =E(.v-3c)-/n 2 =165-65/9. 

or X a =18-406 and X». 05 (9)= 16-919. 

.*. X 2 >X j . 05 (9) leading to the rejection of H 0 at 5% level. 

Conclusion : The sample has not been taken from the 
population of specified variance. 

5.6.2 Testing the homogeneity of several variances : 

If we want to test the hyp .— that all the k- samples have been 
taken from the populations of equal variances i.e. 

<j l 2 =(jj 2 = .. —a k 2 , we compute Bartleti’s-statistic as 


X‘=[n log. S.--& log. S*l/ [l + ~y js ( V L )4~S] ’ 

It follows a X 2 — distr buiion with (k — 1) d.f; 
where k =no. of samples under consideration, 

n =Vx-fv 2 -f v*, v,^!— 1, v a =n a — 1 ,v*=n t — 1, 

n x =size of the 1st. sample, n 2 =size of the 2nd-samp’e, 

S 0 8 = ,v 1 S 1 2 -fv 2 S* 2 + +v*S fc 2 ]/n, 

S^ 2 =I{x l —x 1 ) 2 lv l ) S a 2 =S(x a -Xjj) 2 /v a , ... S 
Sv log a S 2 =v 1 log e Sj 2 -fv a log e S a 2 -|- 4-v* log t S fc 2 , 


and 2 (— \ = — + * — !-••• 

\v / v i v a 
IfX*^X 2 .o»(k— 1), we reject the hyp. at 5% level, 
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but if X s <X 2 . 0s (k — 1), there is no evidence against the hyp. at 6% 
level of significance. 

5.6.3 Testing the independence of attributes in contingency 
-tables : 

If the data is classified into m-rows and n-columns represen- 
ting the m-classes according to one attribute, say A, and n-classes 
according to the other attribute, say B,thus in all into mxn classes, 
then such a table is known am xn contingency-table as given below. 



Now, if we want to test the hyp. — that these 'mri classes are 
independent, we calculate first on the basis of the hyp. of indepen- 
dence the expected frequencies for all the cells and then compute 
the desired statistic-X 2 . The expected frequencies are calculated 
by the formula— 

E < i=(R J xC J )/N, where R, stands for the total ofi- throw, C t for 
the total of jth- column, N for the grand total /.e.N =SR.,(/ = 1 ,2 ...m) 

=ZQ (;=1, 2 ri), and E it for the expected frequency which falls 

into the cell of i-th-row and j-th-column. The desired statistic-X 2 is 
computed as 

X*=S(0— E) 2 /E, where O, E stand for observed, expected frequ - 
encies respectively. 

If X 2 >X 2 . 0S (v), we reject the hyp. at 5% level; v=(m-l)(.e-l), 
but if X*<X 2 . 05 (v), there is no evidence against the hyp. at 6% 
level of significance. 

/Not* : — In a contingency-table, the expected frequencies for 
the.cells of the last row and last column should always be obtained 
by subtraction. Thed.f. associated with a W-conputed from a 
mxn contingency-table is the no. of cells whose expected frequencies 
can be calculated independently i.e. v=(m — 1) («— 1). 
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Exp. ( 14 ) Classification of fields as irrigated and unirrigated 
in a crop-cutting survey on wheat in four districts gave the 
following results — 


District 

A 

B 

C 

D 

Totals 

Irrigated 

44 

36 

12 

30 

122 

Unirrjgated 

130 

167 

98 

143 

538 

Totals 

174 

203 

110 

173 

1 660. Are these districts 


homogeneous in regard to proportion of irrigated fields of wheat ? 

(M. ?c. Ag. Agra, 1956) 
Sol. As the process of tesiing the homogeneity is the same 
as that of testing the independence, so the present problem may be 
treated as the problem of a 2x4 contingency-table. 

Ho : The districts are homogeneous in regard to the 
proportion of irrigated fields of wheat i.e. the eight classes are 
independent. 


The computations for the expected frequencies and the desired 
X 2 are shown in the following table. 


Class 

No. 

Obs. 

freq. 

0 

Expected freq. 

E 

0— E 

(O — E) 2 

(0— E) 2 

E 

1 

44 

(174 X 122)/660 =33 

11 

*121 

3-667 

2 

36 

(203 X 122)/660 =37 

— 1 

1 


3 

12 

(1 10x 122)/660 =20 

- 8 

64 


4 

30 

122— (33 + 37+20) =32 

- 2 

4 

HI qjm 

5 

130 

174—33 =141 

-11 

121 

|||t]5* 

6 

167 

203—37 =166 

1 

1 

■ 

7 

98 

110-20 = 90 

S 

64 


8 

143 

173-32 =141 

2 

i 

4 






l 


Totals 

660 . 

=N| 

N =660 

— 

— 

n|pgjj| 








Now we see that X 2 = 8622 and X 2 . 05 (2—1) (4— 1)=7’815. 
So X 2 >X 2 .o 5 (3) leading to the rejection of H„ at 5% level. 


Conclusion : The districts are not homogeneous in regard 
to the proportion of irrigated fields of wheat. 

Exp. (15) : A new vaccine was tried on a certain no. of 
animals in a heard of cattle which was exposed to a certain disease. 
The nos. affected and notaffected in the categories vaccinated and 
not*vaccinated were as follows— 
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Affected 

Not-affected 

Totals 

Vaccinated : 

25 

75 

100 

Not- vaccina ted : 

15 

85 

100 

Totals 

40 

160 

200 


Is the vaccine useful ? (M. Sc. Ag. Agra, 1958) 

Sol. (H 0 ) : The vaccine is not useful i.e. the two characters 
viz. vaccination and effect are independent. 

The computations for the expected frequencies and the 
desired X 2 are shown in the following table. , 


Class 

O 

E 


m 


l. Vac. affected 

25 

(40 x 100)/200- 20 

5 

25 

1-2500 

2. Vac. nor. aff. 

75 

100- 20= 0 

-5 

25 

0-3125 

3. Not. va<\ aff. 

15 

o 

o 

II 

ro 

O 

-5 

2ft 

1-2500 

4. Not. vac. Not. aff. 

85 ■ 

1 00 — 80 = 80 

5 

25 

0-3125 

Totals 



— 

— 

i i-1250 
! 2(0— E) 2 

E 


Now we see that X 2 =3125 and X 3 . oB (2— l) (2 — 1 ) =3*841. 
So X 2 <X 2 . 05 (1) leading to the acceptance of H 0 at 5 ^ level. 
Conclusion : The vaccine is not useful in controlling the 
disease to the animals. 

5.6.3. 1 Alternative method for computing X 2 for a 2x2 
contingency-table : 


If we have a problem of a 2 x 2 contingency-table of the type 


a|b 

eld 


av given below. Then to test the hyp.— 
that the two characters are independent , 
we can compute the statistic 

(ad— be) 2 x N (provided none of the 
R 1 R 2 C 1 C 2 Class fiequencies<5) 

(( ad — be |-^-)*XN 
R,R 2 C,C a 

(provided any of the class-frequencies<6). 


X 2 = 


or X 2 


\B 

\ 

A \ 


B 2 

Totals 

A, 

a 

b 


A a 

c 

d 

Ra 

Totals 

Cx 

C, 1 

N 
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If X 2 >X 2 .o 6 (l), we reject the hyp. at 5% level, 
but if X 2 <X 2 .o 6 (1), there is no evidence against the hyp. at 5% 
level of significance. 

Exp. (16) : In an orchard of 1000 trees, a data was taken 
of the no. of shaded and unshaded trees, and in each of the classes 
the proportion of high to low yielding trees was noted. The results 
were recorded as in the following table. 

Shaded Un shaded 

High-yielders : 350 205 

Low-yielders 250 1 95. Do these figures show 

that the shade has any effect on the yield of the trees ? 

(M. Sc. Ag. Agra, 1961) 

Sol. (Ho) : The shadt has no effect on the yield of the trees. 

The present problem can be arranged into a 2x2 contingency 
table and the desired X ? -statistic can be computed directly as 
shown below. 


(2 (ad— bc) 2 xN 

( 50 X 195-205 x250) 2 x 1000 
555x445x600x400 

144500 

29639 

or X 2 =4-87 and X 2 . 06 (l)=3-841. 
So X 2 >X 2 . 0S (1) leading to 
the rejection of H 0 at 6% 
level. 


Conclusion : The shade has an effect on the yield of the 
trees i.e. the two characters viz. shade ani yield are not independent. 

Exp. (17) In an experiment on the immunization of goats 
from Anthrax, the following results were obtained. 

Dead Survived 
Inoculated 2 10 

Not -inoculated : 6 6 . Derive your inference on., the 

efficiency of the vaccine ? 

Sol. (Ho) : The vaccination has no effect on the survival of 
goats from Anthrax. 


\ Shade 
\ 

\ 

Yield\ 

Shaded 

Un- 

shaded 

Totals 

High 

350 

205 

555 


= a 

=b 

=R, 

Low 

250 

196 

445 


=c 

* 

= d 

=r 2 

Totals 

600 

400 

1000 


=c, 

=c 2 

= N 
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Tne data can be arranged into a 2x2 contingency-table and 
the desired X 2 — statistic can be computed directly with Yale's 
correction, as shown below. 


j | ad— be |~J*xN 

\Sur. 

\ 

\ 

Vac. \ 

Dead 

Survived 

Totals 

RiRACa 

( | 2x6— 10x6 I —12)2x24 
~ 12x12x8x16 

(48 — 1 2) 2 X 24 


1 

i 


Inocu. 

-2 

.'10 

=b 

12 

= R i 

]2x 12 X 8 X 16 

Not- 

6 

6 

12 

30x36x24 

12x12x8x16 

inocu. 

=c 

=d 

| =R 2 

27 


8 

16 

24 

= I6 

| Tot?,s 

=Cj 

=c 2 

= N 


or X 2 =1-6875 and X 2 .o 5 (l) =3 841. 

So X 2 <X\o 5 (l) leading to the acceptance of Ho at 6% level. 


Conclusion : The vaccination has no effect on the survival 
of goats from Anthrax i.e the two characters viz. vaccination and 
survival are independent. 

5.6.4 Applications of X ! in geaetical -experiments : 

5.6.4. (a) testing the significance of ratio in a single 

-factor segregation : Sometimes in a genetical problem of single 
factor segregation, we want to test the significant of deviation of 
an observed segregation from a theoretical one i.e. the hyp. (H„)- 
that the classes or genes of a factor segregate in the given ratio. 
For the purpose, first we calculate on the basis of the hyp. the 
expected frequencies of the classes and then compute the statistic 
X 2 =S(0— E) 2 /E, where the symblos have their usual meanings 
If X->X 2 . 05 (v), we reject the hyp. at 6% level, 
but if X 2 <X 2 . 05 (v), there is no evidence against the hyp. at 
6% level of significance. 

Exp. (18) : In a single factor F 2 -cross (Aax Aa), the observed 
cla>s frequencies for the two different classes A' and ‘a’ are 312 
and 88 respectively. Test, whether the data is in agreement with the 
Mendelian ratio 3:1? 

, Sol. (H 0 ) : The data is in agreement with the ratio 3 : 1, 
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The computations for the expected frequencies and the desired 
X* are shown in the following table. 


Class 

0 

E 

O— E 

(O— E) a 

(O — E) 2 

E 

A 

a 

312 

8? 

(3 X 400)/4 = 300 
400-300 =100 

12 

-12 

144 

144 


Toials 

400 

=N 

N =400 

— 

1 1 *92 

~ 1 =2(0-E) 2 /E 


l\ow we have X a =l*92 and X a . 05 (1)=3 - 841. 

So X a <X*. 05 (1) leading to the acceptance of H 0 at 5% level. 


Conclusion : The observed data are in agreement with the 
ratio 3:1. 

Exp (19) In a cross between ivory and red snapdragons, an 
experimenter obtained the following results in the F 2 -generation— 
Phenotype : Red Pink Ivory 

No. of plants : 22 52 23. Test whether these 

figures show that the segregation occurs in a Mendelian ratio of 
1:2:1? 

Sol: (Hq) : The segregation occurs in the rdtio 1.2 \ 1 in the 
observed data. 


The computations for the expected frequencies and the desired 
V are shown in the following table. 


Class 

0 

E 

O-E 

(O-E) 2 

(O— E) a /E 

Red 

22 

(1 x97)/4 =24-25 

—2-25 

5-0625 

0-2087 

Pink 

52 

(2x97)/4 =48 50 

3-5 

12-2500 

0-2525 

Ivory 

23 

97-(24-25+48-50)=24'25 

1-25 

1-5625 

00644 

Totals 

97 

=N 

N =97 

— 

— 

I 0-5256 

1=2(0 — E) a /E | 


Now we have X 2 =0-52G and X a .o 5 (2)=5*99l. 

So X a <X a . o5 (2) leading to the acceptance of of Ho at 5% level. 


Con elusion : The observed data are in agreement with the 
ratio 1:2:1. _ ^ 

5. 6. 4. (b) Testing the homogeneity of several families in 
a single factor segregation : 

Sometimes in a genetical problem of single factoi segregation,* 
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the data are available for a no. of families ‘say k’each with two genes 
or classes “say A, a” segregating in the given ratio ‘say n^ : m s ’. 
Here we may be interested in 
knowing the consistency of the 
families or groups. In such cases, 
the value of X a is calculated 
separately for each family and 
all these values are added up to 
get the total chisquare ‘say XV 
with k. d f. This X 2 x is further 
partitioned into the following 
two independent X 2 -cornponents. 

(1) X a o : due to deviations from the theoretical ratio. 

(2) X 2 h : due to heterogeneity between the families. 


Family 



Computed X 2 

D.F. 

^ 2 *05 

1 

2 

va _ (m 2 A,-m 1 a 1 ) a _ 

1 mjm^ 

v 2 (m 2 A !! --m,a !! ) 2 

a m 1 m 2 R a 

1 

1 

3841 

IS 

k 

x , (m 2 A*— m,a*) 2 

* m,m 2 R* 

1 

91 

Totals 

X a T=X 2 1 + X 2 2 +...+X 2 fc =... 

k 

... 

Due to 
dev. 

y2 (m 2 Cj irijCg) 2 

’ D ~~ nijirijN 

1 

3841 

Due to 
hetero. 

11 

Q 

1 

II 

k— 1 

... 


The first component ‘X 2 d’ with I d. f. is obtained from the 
totals of the two genes over all the families while the second 
component 'X 2 h’ with (k — 1) d. f. is obtained by subtracting X^ 
f-om X*t. These two component-chisquares viz. X 2 d and X 2 h are 
used to test the following (i) and (ii) hypotheses respectively. 

H 0 : (i) The data are in agreement with the given ratio 'm 1 : m % ’ . 

(ii) The families are kpmogeneous in regard to the segregation 
ratio , m 1 : m t ’. 

: Finally, we arrive at the following results— 

( 1) 11 X 2 d>X 2 . m (1), we reject the hyp. (i) at 6% level i.e. the 
< families do not segregate in the given ratio, 


Family 

Genes 

A | a 

Totals 

i 


3l 

Ri 

2 

a 2 

a ? 

/■ 

R a 

k 

A* 

a k 

R k 

Totals 

! c > 

C 2 

N 
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or the data on an average do not,show 
an agreement with the given ratio ‘mi : m 2 . 

But if X 2 d<X 2 . 05 (1), these is no evidence against the hyp. (i) 
' at 5% level i.e. the data show an 

agreement with the given ratio ‘mi : m 2 ’. 

(2) If X 2 H>X 2 . 05 (k— 1), we reject the hyp. (ii) at 5% level i.e. 

the families are not homogeneous in regard 
to the segregation ratio ‘mi : m 2 ’. 

But ifX*H<X 2 o 5 (k— I), there is no evidence against the hyp. 

(ii) at 5% level i.e. the families are 
homogeneous in regard to the segregation 
ratio ‘mx : m 2 ’. 

Exp (20) : For the data given in the table, test whether 


(i) the genes ‘A* and ‘a’ 

\Genes 




segregate in the ratio 3:1, and 

\ 

\ 

A 

a 

Totals 

(ii) the families are 

\ 




homogeneous in the segrega- 

Family \ 




tion of(A,a)in the ratio 3:1? 

1 

55 

20 

75 

Sol. (Ho) : (i) The genes(A,a) 

2 

89 

25 

114 

3 

78 

23 

101 

segregate in the ratio 3 : 1. 

4 

61 

19 

80 

(ii) The families are 1 
homogeneous in regard to the 

1 




Totals 

283 

87 

370 


segregation ratio 3:1. 


The computations for the desired statistics viz, X 2 t, X 2 d and 
X 2 h are shown in the following table. 


Family 

Computed X 2 

D.F. 

X 2 .o 5 

1 

(1x55-3 x20) 2 

Xl 3x1x75 - 01111 

— 

1 

3-841 

2 

Y2 _ (IX89-3X25) 2 

3X1X114 ° fi731 

1 

99 

3 

1 

99 


4 



1 

99 

ggj 

X*t=X 2 x+ +X a 4 =1-0182 

4 

9-488 

Due to 
ev. 

= <i ^1 3 7 o x87,, -»«« 

1 

3-841 

Due to 
hetero. 

X*h=X*t— X 2 d =0-9646 

3 

7-815 
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Now we see that— 

(i) X , d </- 2 .06 (1) leading to the acceptance of hyp. (i) at 5% level. 

(ii) X 2 h<X 2 ob («>) leading to the acceptance of hyp. (ii) at 5% level. 
Conclusion : The genes (A,a) segregate in the given ratio 3 : 1 

and the families are homogeneous in regard to this segregation ratio. 
Exp. (21) For the figures given in the table, test whether — 

(a) : (i) the genes ‘A’ and ‘a’ 
segregate in the ratio 9 : 7, and 

(ii) the families are homo* 
geneous in the segregation of 
(A, a) in the ratio 9:7? 

(b) : Also test the homogeneity 
of five families in the ratio 
whatever they indicate ? 


Sol. (a). (Ho) : (i) The genes (A, a) segregate in the ratio 9:7. 

(ii) The families are homogeneous in regard to 
the segregation ratio 9 : 7. 

The computations for the desired statistics viz. X 2 t, X 2 d and 


X*h are shown in the following table. 


Family 

Computed X 2 j 

D. F. 

X 2 05 

1 

y2 (7x70 9 X 55)~_ q.qq^ 

9x7x125 - ° 0032 • 

1 

3841 

2 

(7X35-9X20 ) 2 
*■*- 9x7x01 “° 0315 

1 

ft 

3 

2 (7 xf) 6 — 9 x 29) 2 3 . 2 Q 47 

* 3 9X7X85 

1 

If 

4 

v2 __ (7x34 — 9x32) 2 „. oni , 
9X7X66 06013 

1 

*> 

5 

X2 _ (1^9X24)’ gl86 
* 5- 9x7x63 “ 0818b 

1 

1 

M 

Totals 

X^-X^-f +X 2 j -4-659 J 

5 

11-070 

Due to 
dev. 

(7x234-9xl66) 2 _ 

A ° 9X7X400 -U8Z29 

1 

3-841 

Due to 
hetero. 

X 8 h=X 2 t— X 2 d =3-8364 | 

4 

9-488 


Family 

Genes 

Totals 

A 

a' 

1 

70 

55 

125 

2 

35 

26 

61 

3 

56 

29 

>5 

4 

34 

32 

» 6 

5 

39 

24 

63 

Totals 

234 

| 166 

400 
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Now we see that — 

(i) * 2 d<X 2 .o 5 (l ) leading to the acceptance of hyp. (i) at 5% , level; 

(ii) ^ 2 h<Z 2 . 0 5 (4) leading to the acceptance of hyp. (ii) at 5% level. 
Conclusion : The genes (A,a) segregate in the given ratio 9 : 7 

and the families are homogeneous in regard to this segregation ratio, 
(b). H 0 : The families are homogeneous in regard to the 
segregation ratio whatever the data indicate i.e. 234 : 166. 
The testing ol homogeneity between the families in the ratio 
whatever they indicate in the observed data is the same as that of 
testing the independence of two characters. Hence, the question 
will be answered in the same way as that of testing the independence 
in a nx2 co tingency- uble. The computations for the expected 
frequencies and the desired X 2 are shown in the following table. 


Class 

o 

E 

1 

0 

1 

m 

(O-E) 2 

tO -E) 2 

E 

Family l.A 

70 

| (234xl25)/400 =73 

-3 

9 

01 233 

| 2. A 

35 

1 (234x61)/400 =36 

-1 

1 

0*1278 

| 3.A 

56 

(234x85)/400 =50 

6 

36 

07200 

„ 4. A 

34 

(234x66)/400 =39 

-5 

25 

0 6410 

ii 5. A 

39 

234-(73 4-36 + 50 + 391 = 36 

3 

9 

0-2590 

,, l.a 

55 

125-73 =52 

3 

1 9 1 

0173 

2. a 

26 

61— >0 =25 

i 

r i 

00400 

ii 3.a 

2. 

85—50 =35 

-6 

| 36 

1 0286 

„ 4.a 

32 

66-39 =27 

5 

26 

0-9259 

„ 5.a 

34 

6 .'-36 =27 

-3 

9 

0- 333 

Totals i 

400 

-N 

N=400 

* 1 

— 

4-2630 = 
2(0— 


Now we have X 2 =4 - 263 and X 2 . 0£) (4)=9'4t<8. 

So X 2 <X 2 . 05 (4) leading to the acceptance of hyp. at 5% level. 

Conclusion : The families are homogeneous in regard to the 
segregation ratio 234 : 166. 

5 6. 4. (c) Testing tie significance of ratio in two factor- 
segregation : 

Sometimes in a genetical problem of two factor segregation, 
the data are available for two characters or factors ‘say A, B’ each 
with two classes or genes 'say A, a and B, b respectively’ segregating 
simultaneously in the given ratios'say m, : 1 and m a : 1 respectively. 
Here we may be interested in knowing whether the two factors are 
segregated in the given ratios independently, or linked. Thus 
problem may be looked as testing the hypotheses— 
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Ho (0 The genes ( A, a) are segregated in the given ratio t m l : V. 

(ii) The genes (B, b) are segregated in the given ratio ‘m 2 : T. 

(iii) Two factors are in iependently segregated in the given ratios. 

Here on the basis of the hyp. of independence of the two 

characters and their segregation according to the given ratios, we 
obtain the four classes viz. AB, Ab, aB, and ab. The frequencies 
of these four classes are supposed to be in the ratios 
mjm 2 : m, : m 2 : l respectively. Tnus if the observed frequencies 
of the above mentioned four classes are (in the order) a u a 2 , a 3 and 
a 4 with N= Uj+a.-l-ao-l-a^, then their corresponding expected 
frequencies may be obtained as shown below. 


E(a 2 )= 


m^N 

(mi-HXmjs+i) ' 


E(a 2 )= 


m x N 

( ,n i+0(®i +1) ’ 


(Ea 3 )= ( j)i+1 ^ +1) ) and E(a 4 HN-[E(a,)+E(a 2 )+E(a 3 )l. Now we 

compute the desired statistic X ! =S (0-E) 2 /E by the usual method, 
which follows a X 3 — distribution with 3 d. f. 


If X 2 <X 2 . 05 (3), there is no evidence against the hyp. (H 0 ) at 
5% level. It concludes that the two characters are independent and 
also they are segregated in the given ratios. 

n 

It should be noted that in such a situation the answer is 
complete with this stage. But in the contrary case, the procedure is 
still carried on for further detecting the cause of its being signi ficant, 
as stated below. 


If X 2 ^X 2 . 05 (3) , we reject the hyp. at 5%-level of significance. 
In this situation, the sign ficant value of X 2 may be supposed 
to be so owing to some discrepancy in the hyp. (tf 0 ), which may be 
due to any one or more of the following three reasons. 

(1) The genes (A, a) might have not segregated in the given ratio. 

(2) The genes (B, b) might have not segregated in the given ratio. 

(3) The two characters might have not segregated in the given ratios 
independently but might have given some evidence of linkage. 
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Thus in order to detect the above mentioned causes of 
discrepancy, we partition the above calculated X 2 into three 
independent component-chisquares each with 1 d. f. as shown in 
the following table. 


Source 
due to 

Computed X 2 

D. F. 


A-Character 

r fa,+a 2 -mi(a 3 -)-a 4 )] 2 

A_ m,N 

1 

3*841 

B -Character 

v. fa, + a 3 -m 2 (a 2 +a 4 )j 2 
/_B= m 2 N " - 

1 

») 

L-Linkage 

X 2 l = X 2 — (X 2 a - f X 2 b ) 

or 

1 



la, + m,m 2 a 4 - (m 2 a 2 + m,a 3 )l 2 




m,m ? N 



Totals 

X 2 =X 2 a+X 2 b +X 2 l =... 

3 

7*815 


These three component-chisquares viz. X 2 a, X 2 b and X*t are 
used to test the hyp. (i), (ii) and (iii) respectively. 


Finally, we arrive at the following results— 

(1) If X 2 a5*X 2 .o 6 (1), we reject the hyp. (i) at 5% level i.e. 
the genes (A, a) are not segregated in the givea ratio. 

But if X 2 a<X 2 . 0S (l), there is no evidence against the hyp. (i) 
at 5% level i.e. the genes (A ,a) are segregated in the given ratio. 

(2) If X 2 b ^X 2 . 06 (1), we reject the hyp. (ii) at 5% level i.e. the 
genes (B, b) are not segregated in the given ratio. 

But if X 2 b<X 2 . 06 (1), there is no evidence against the hyp. (ii) 
at 5% level i.e the genes (B, b) are segregated in the given ratio. 

(3) If X 2 i>X 2 . 05 (1), we reject the hyp. (iii) at 5% level i.e. the 
two characters are not segregated in the given ratios independently 
but have an evidence of linkage. 

But if X 2 l<X 2 .o 5 (1), theie is no evidence against the hyp. 
(iii) at 5% level i.e. the two characters are segregated in the given 
ratios independently. 
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E*p. (22) In a cross involving two Mendelian factors, the 
following results were obtained in the F a segregation. Are these 
observations in accordance with the hypothesis that the two factors 
segregate independently and that the four classes of offsprings are 
equally viable? (M. Sc. Ag. Agra, 19(H) 


Flat leaves 

Crimpled leaves t 

Norma 1 
eye 

| Primrose 
eye 

Normal 
eye , 

Primrose 

eye 

32$ 

122 77 33 


Sol. (Ho) : The two factors viz. the shape of leaves and the 
colour of eye are segregated independently in the ra'io 3 : I each. 

Let the two factors be A and B. Also, let 'A' stand for flat 
leaves, ‘a’ for crimpled leaves, ‘B’ for normal eye and ‘b’ for 
primrose eye. Now on the basis of the hyp. assumed, if (A, a) are 
segregated in the ratio'3 : 1, (B, b) are also segregated in the ratio 
3:1, and the two characters are segregated in the given ratios 
independently, then the frequencies of the four clisses AB, Ab, aB, 
ab will occur in the ratios 9 : 3 : 3 : 1 respectively. The expected 
frequencies and the desired X 2 are computed in the following table. 


Class 

0 

E 

0— E 

(0— E) 2 

| (0-E) 2 /E 

AB 

328 

^X560 = 315 

1 6 

13 

169 

0-5365 

Ab 

122 

1 X56 = 105 

16 

17 

289 

2-7524 

aB 

77 

3 

jgX560 = 105 

-28 

784 

7 4667 

ab 

33 

560-(315+105+105)=35 

-2 

4 

01143 

Totals 

560 
= N 

N = 560 


— 

1 10-8699= ' 

1 S(0-E) 2 /E 


Now we have X 2 = 10‘8699 and X 2 . 05 (3)=7815. 


So X 2 >X 2 .o a (3) leading to the rejection of hyp. at 5% lev el 
ofsignificance.lt shows some discrepancy in the hyp. assumed. 
In order to detect the cause of t his discrepancy, we partition the 
above calculated X* into three independent co nponent-chisquares 
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viz. X 2 A)X 2 b and 1\ as shown in the following table. 


Source due to 

Computed X 2 

D.F. 

X 2 .05 

A-Shape of 

[328 + 122 T 3(77 + 33)] 2 _ 8 

1 

3*841 

leaves 

A 3x560 



B-Colour of 
eye 

Y2 _ [328+77 — 3(122-1-33 )] 2 
* B 3X560 1 

1 

»> 

L-Linkag^ 

X 2 l=10*8699 — (8*57 14+2*1429)=0* 1556 

1 

II 

Totals 

X 2 =10'8699 

3 

7*815 


Now we see that— 

ti) o 5 (l) leading to the rejection of hyp. at 5% level, 

(ii) X 2 b<X 2 . 05 (1) „ „ „ acceptance of „ ,, „ , and 

(iii) X 2 l*^X 2 .q 5 (1) ,, i, )■ i) ,, it •> »» a 5% level. 

Conclusions : 

(1) The character 'shape of leaves' is not segregated in the 

ratio 3:1. 

(2) The character 'colour of eye’ is segregateiin the ratio 3:1. 

(3) The test provides no evidence of linkage i.e. the two 
characters are segregated independently. 

EXERCISE V 

Q. 1. A sample of 100 plants was found to have a mean height 
of 73.65 cms. Could it be reasonably regarded as a simple 
random sample from a normal population whose mean is 75 cms. 
and s.d. is 3.0 cms. ? 

Q.2 Suppose that 60 senior students in a college A and 
80 senior students in another college B had mean statures of 69 
and 67.5 inches respectively. If the s. d. for statures of all the 
senior students is 2.10 inches, is the difference between the mean 
statures of the two groups is significant at 1 percent level of signifi- 
cance ? Given SNV *Z’ =2*58. 

Q.3 A potential buyer of light bulbs bought 50 bulbs of 
each of two brands A and B. Upon testing these bulbs he found 
that brand A had a mean life of 1285 hours with a s.d. of 40 
hours whereas brand B bad a mean life of 1320 hours with 
a s.d. of 47 hours. Can the buyer be quite certain that the two 
brands A and B do differ in quality ? 

Q. 4 A sample of 400 men from south India ha9 a „mean 
height of 65*85 inches and a s.d. of 2*50 inches, while a sample 








50 


Agricultural Statistics 


of 100 men from North India has a mean height of 66-20 inches 
with a s.d. of 2-52 inches. Do the data indicate that the North 
Indians are on the average taller than the South Indians ? 

Q.5. A random sample of 200 villages was taken from 
Gorakhpur District and the average population per village was 
found to be 485 with a s.d. of 50. Another random sample of 
200 villages from the same district gave an average population of 
510 per village with a s. d. of 40. Is the difference between the 
averages of two samples statistically significant ? Give reasons. 

Q.6 (a) A random sample of 1000 farms in a certain year 
in Punjab gives an average yield of wheat of 20 mds./acrc with a 
s.d. of 1'92 mds. /acre. Another random sample of 1000 farms 
in the same year in U. P. gives an average yield of wheat of 21 
mds./acre with a s. d. of 2‘24 mds. / acre. Show that the average 
yields in the two provinces are the same ? 

(b) Explain the terms null hypothesis and 5 percent level of 
significance ? 

Q. 7 Find Student’s ' t ’ for the following sample of eight 
drawn from a universe with zero mean — 

-4, -2, —2, 0, 2, 2, 3, 3. 

Q.8 A ceriain drug administered to each of 12 patients 
resuled in the following increases of blood- pressures — 

5, 3, 8, —1, 3,. 0, 6, —2, -1,5, 0,4. 

Can it be concluded that the stimulus (drug) will in general 
be accompanied by an increase in blood-pressure ? 

Q.9. The yields of two tydes ‘Type 17’ aad ‘Type 51’ of 
grains in pounds per acre in 6 replications are given below. What 
comment would you make on the differences in the mean yields ? 
You may assume that if there be 5 d.f. and p= 0-2, t is 1-476. 
Replications 1 2 3 4 5 6 

Type 17 : 20 50, 24'60 23‘06 29-98 30’37, 23*83 

Type 51 -.24-86, 26 39 28*19 30-75 29'97 22’04. 

Q.10. Determination of protein content of five varieties 
of wheat by a standard ai d newly developed rapid method gave 
the following results (in units of grams per ICO grams'. 

Varieties: A B C D E 

Standard Method : 13 5 13 0 13 0 13 6 14-1 

Rapid Method: 131 131 128 131 137. 

(Do the estimates obtained by two methods differ significanty ? 

Q. 11. Test whether a small electric current affects the growth 
of maize-seedlings. Ten pairs of plants were grown in parallel 
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boxes and one member of each pair was treated by receiving a J 
small electri c current. The differences in height (in mm.) between 
the treated and untreated were as follows. 

6-0, 1*3, 10*2, 23 0, 3*1, 6‘8, —1-5, -14*7, —33, 11 1. 

Q. 12. (a) The following figures give the percentage extension 
under a given load of two random samples of yarn, the first 
sample being taken before washing, the se:ond after six washings — 
Be^re wishing : 12*3 13*7 10*1 11*4 14*9 12 6 

After six washinas : 15*7 10*3 12*6 14 3 P/5 13*8 1\*9. 

Is there any evidence that extensibility is aifrcted by washing? 
(b) In another experiment on the same type of yarn, si* 
lengths of yarn were selected at random and each length was cut 
into two halves. One of the halves was tested for extension without 
washing, the other after six washings and the following percentage 
extensions were obtained — 

Length : 1 2 3 4 5 6 

Before washing : 13*9 12’5 110 11*8 10 8 14 6 

After six washings : 14*7 12*1 13*2 13*6 11*5 15*4. 

Is there any evidence that extensibility is affected by washing ? 
Q.13 Eight pots growing three wheat plants each were 
exposed to a high tension discharge while nine similar pots were 
enclosed in an earthern wire case. The number t>f tillers in each 
pot were as follows — 

Caged: 17 26 18 25 27 28 26 23 17 

Electrified : 16 16 22 16 21 18 15 20. 

See whether clarification exercises any real effect on the 
tillering by using t test of significance ? « ( I.C.A.R . 1956) 

Q. 14. Two horses A aud B were tested according to the 
time (in seconds) to run on a particular track with the results — 
Horse A : 28 30 32 35 33 29 34 

Horse B : 29 30 30 24 27 29. 


Test whether y>u can discriminate between the two horses? 

Q. 15 The following data represent the yields in bushels of 
Indian corn on ten sub-divisions of equal areas of two agricultural 
plots in which plot I was control plot treated the same as plot II, 
except for the amount of phosphorus applied as a fertilizer — 

Plot I : 6*2 5*7 6*5 6*0 6*3 5*8 5*7 6*0 6*0 5*8 

Plot II : 5*6 5*9 5*6 5*7 5*8 5*7 6*0 5*5 5*7 5*5. 

Is there a significant difference between the yields on the two plots, 


using the difference between their means a criterion of judgement ? 
Q. 16. Two samples of sizes 10 and 12 give the sum 'of 


squares of deviations from their respective means 60*3 and 61*2 
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as 180 and 132 respectively. Can they be regarded as drawn from 
the same normal population ? 

Q. 17 Why there are different tests in Statistics for testing the 
significanc of mean difference ? 

Mitchell conducted a paired feeding experiment with pigs on 
the relative value of limestone and bone-meal for bone-development. 
The results are ash content in% of scapulus of pairs fed on limestone 
and bone-meal. 

Pair: l 2 3 4 5 6 1. 8 

Limestone:. 492 53*3 50*6 52’0 46*8 50*5 52i 53‘0 

Bone-meal: 515 54'9 52*2 53*3 51'6 54*1 54*2 54’3. 

Determine the significance of the difference between the means- 

(1) by assuming that the values are paired, and 

(2) by assuming that the values are not paired. (ICAR, 1956) 

Q. 18 (a) Define Student’s ‘t’ and give its applications. 

(b) The following yields (in pounds per plot) were obtained 
from five plots each of two varieties of wheat A and B— 

Variety A: 12 10 12 13 13 

Variety B: 8 9 11 10 11. 

Test the significance of the difference between the varieties by 
means of both t and F tests, assuming (i)plots are paired, and (ii)they 
are independent, there being no correspondence between the plots 
of the two varieties' ? ( M . Sc. Ag. Agra, 1957) 

Q.19 Ten determinations of a quantity subject of error are— 
6-6, 0*7, 6*8, 0'4, 7% 69, 66, 5*8, 0*3, 0 5. 

Judge whether the sample mean is compatiable with a normal popu- 
lation with mean as 7 - 0 and s.d. unity. Show that your inference is 
reversed if population-variance be unknown ?(M.Sc.Ag. Agra, 1962) 
Q.20 For two random samples of sizes 10 and 12, given that 
* 2 =22, * 2 =25, S (*!— * x ) 2 =120 and S (* 2 — * 2 ) 2 =314. Test, 
whether the two samples are drawn from the same normal population. 

Q. 21 Two halves of a field were sown with two varieties of 
wheat A & B. One hundred earheads were selected randomly from 
each half and no. of grains in each earhead counted. The data 
gave the following results— 

Sample A : mean=30’5, s.e. of mean=0‘5 
Sample B : mean=32’l, s.e. of mean=0 , 6. 

Test the significance of difference between the means and 
give your inference from results. What would you have concluded 
if the above values had been obtained from samples of 5 earheads 
eadP instead of 100 earheads ? (M. Sc. Ag. Agra, 1950) 
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Q.22 Lots of ten bees were fed two concentrations of syrup — 
2C% and 66% at a feeder half a mile from the hive. Upon arrival 
at the hive their honcy-sacs were removed and the concentration 
of fluid measured. In every case there was a decreasj from feeder 
concentrations. The decrease were— 

From 20% to syrup : 0'7, 0-5, 0 4, 0% 0 5, 0 4, 07, 0 4, 0’2, 0 5 
From 65% to syrup : 17, 2 8, 2 2, 1’4, 13, 21, 0'8, 3'4, 1*9, 14. 
Test whether the decrease in concentration during flight 
differs significantly with the two syrups ? (M. Sc. Ag. Agra, 1960) 

Q. 23 (a) What is the difference between *t’ and ‘F’ tests, and 
when are the two tests identical ? (M. Sc. Ag. M.U. 1967) 

(b) Write a critical note on tests of significance and their uses 
in agricultural statistics ? (M. Sc. Ag. M.U. 1967) 

Q. 24 Discuss the uses of X- in Genetical-analysis? 

It is claimed that students from cities are more sociable than 
students from the villages. Test if this claim is valid for the data— 

Social Non-social 
Students from cities : 10 3 

Students from villages : 2 15 (\i .A. Patna, 1955) 

Q. 25 The following data are observed for hybrids of Dhatun— 
Flowers violet, fruits prickly •■•47 


jy 

»> 

„ smooth 

-12 

yy 

white 

„ prickly 

-21 

»» 

19 

„ smooth 

- 3. 


Using X 2 -test, ‘ nd the association between colour of flowers 
and character of fruit ? (M. Sc. Ag. Agra, 1956) 

Q. 26 Two samples of polls of votes for two candidates A 
and B for a public office are taken, one from among residents of 
urban areas and the other from residents of rural areas— 

Votes for area A B Totals 

Rural : 620 380 1000 

Urban : 550 450 1000. 

Examine whether the nature of the area is related to voting 
preference in this election ? ( I.A.S . 1956) 

Q. 27 The following data show the effect of vitamin B- 
deficiency on the sex ratio of the offsprings of rats. Is the effect 
significant ? 

Male Female Totals 

Vitamin B deficient : 123 153 276 ^ 

Vitamin B sufficient : 145 150 295. 

Q. 28 Twelve inoculated experimental animals and the other 
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12 not-inoculated animals were exposed to the infection of a 
disease. The following frequencies of dead and surviving animals 


were noted in the two ca^es. 

Dead 

Survived 

Totals 

Inoculated : 

2 

10 

12 

Not-inoculated : 

8 

4 

12. 


Can the inoculation be said to have an evidence of preventing disease? 
Q. 29 In an experiment with immunization of cattle fiom 
tuberculosis, the following results were obtained — 

Affected Unaffected 
Inoculated : 12 2> 

Ni t-inoculated : 16 6. 

Examine the effect of vaccine in controlling susceptibility to 
tuberculosis ? 1 1.A.S. 1948) 

Q. 30 The following data relate to two types of twins — 

Both right handed One left and the other 

right handed 

Fraternal ; 10 2 

Identical 5 5. 

Examine whether identical twins are different from fraternal 
twins in having a lower proportion of cases of both members of 
the twins being right-han<|ed ? (M. A. Patna, 19 55) 

Q. 31 In a public preference survey, the people interviewed 
were classified according to their opinion regarding intercaste- 
marriage and their age as follows— 

Opinion Age In years 



19-25 

26-35 

36-55 

Over 

Unconditional support : 

76 

i>5 

96 

10 

Conditional support : 

69 

117 

1 6 

17 

Indifference : 

14 

27 

35 

4 

Conditional opposition : 

60 

168 

210 

46. 

Examine whether and 

in what 

way 

opinion 

regarding 


intercaste-marriage changes with age? ( M.A . Patna , 1953) 

Q. 32 (a) In experiments on pea-breeding, Mendel obtained 
the following frequencies of seeds— 

Round and yellow=315. Wrinkled and yellow— 101 
Round and green=108, Wrinkled and green— 32. 

Theory predicts that the frequencies should be in the 
proportions 9 : 3 : 3 : 1. Examine the correspondence between theory 
and experiment ? ( M.Sc.Ag . Agra, 1952 ) 

(b) olefine X 2 and write in brief the conditions or assumptions for 
the application of X*-test ? What is meant by degrees of freedom ? 
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Q. 33 (a) From the following data for 4 segregating Fa- 
families, test whether they agree with one another with a ratio 
3 : 1 and also in the ratio whatever they indicate— 


Family 

Class 

A B 

1 

72 

21 

2 

55 

20 

3 

80 

25 

4 

78 

23. 


(b) Define Z 2 test and write its applications ? 

Q. 34 In a back-cross progeny ( pt/PTx pt/pt ), the observed 
freqnencies in the four classes are givet. below. 

Class PT Ft pT pi 

Observed freq. 190 38 35 204. Is there 

any evidence for the exist* nee of linkage in the coupling phase ? 
Q. 35 In an F 2 -segregatin for two characters A and B, the 
number of plants observed in the classes AB, A b 9 gB and ab were— 
AB Ab gB oh Total 

290 90 75 25 480. 

Test whether each of the genes is segregating according to a 
3 : 1 ratio and whether the two characters are segregationg 
independenly ? Do you think that there is any evidence of linkage ? 

Q. 36 Genetic theory states that children having-one parent 
of blood type M and the other of blood type N will always be one 
of the three types M, MN, N and that the proportions of three type 
will on average be as 1:2:1. A report states that out of 300 
children having one M parent and one N parent, 30 percent were 
found to be type M, 45 percent type MN and remainder type N. 
Test the hypothesis by X 2 test ? 

Q. 37. In a F 2 segregating for t\*o characters A and B the 
no. of plants observed in the classes AB, Ab, <;B, and ab are as— 
AB Ab gB ab Total 

289 92 73 26 480. Test whether each of 

the genes is segregating according to 3 : 1 rat o and whether two 
genes are segregating independently ? Do you think there is any 
evidence of linkage? (M.Sc.Ag. Agra 1958) 

Q. 38 In the F 2 -generation of a cross between a two rowed 
variety of barley producing green seedlings (VVLgLg) with a six- 
rowed variety with light green seedlings (vv Iglg), the following nos. 
of plants were obtained in the four phenotypic classes. 

Phenotypic class : VLg Vlg vLg vlg 

No. of plants : 281 59 60 58. Calculate the 
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goodness of fit between the ratio observed and the ratio expected on 
the basis of independent inheritance of the two factor pairs ? 

(M. Sc. Ag. AU, 1965) 
Q. 3 In a herd of cattle composed of four different breeds 
of the total nos. of animals of each breed and of animals affected 
by a cartain epidemic-disease in each breed are as follows— 

Breed ’ A B C D 

Total no. of animals " 200 250 300 250 , 

No. of animals affected " If 27 38 24 Do the breeds differ 

significantly in their succeptibility to disease ? (M.Sc.Ag. AU, 1958) 
Q. 40 It was observed that out of 463 smokers 55 were found 
to suffer from heart trouble, while from 337 non-smokers only 25 
were found to be affected thus. Would you conclude from these 
data that smoking affects health ? (M Sc. Ag. AU, 1959) 

Q. 41 From the following tab'c showing the no. of plants 
having certain characters, test the hypothecs that the flower-colour 
is independent of flatness of leaves. 

Flat leaves Curled leaves 
White flowers : 99 36 

Red flowers : 20 5. (M.Sc.Ag. AU, 1957) 

Q. 42 The following data occur in a memoir of Karl Pearson — 
Eye colour in sons 
Not light light 
Eye colourf Not light 23(1 148 

in fathers ( light 151 471. Test whether the colour of 

the son’s eyes is associa ed with that of the father's. (IAS, 1942) 
Q. 43 In an antimalarial campaign in a certain area of Baraut, 
quinine was administered to 500 persons out of a total population 
of 1000. The no. of fever cases is shown below - 
Treatment : Fever No-Fever 
Quinine : 200 300 

No-Quinine : 150 350. Test the usefulness of quinine in 

preventing malaria attack. 

Q. 44 From the figures given below, test whether the 
intelligence of the sons is associated with that of the fathers. 

Int. fathers with int. sons= 20, Int. fathers with dull sons=30, 
Dull fathers with int. sons*=40, Dull fathers with dull sons=70. 

Q. 45 The following table gives the no. of aircraft accidents 
that occurred during the various days of the week. Test whether 
the a^pidents are uniformly distributed over the week. 

Days : Sun. Mon. Tue. Wed. Thu. Fri. Sat. Total 
Kq. of accidents : 14 Id 8 12 11 9 14 84. 
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Q. 46 Two hundred digits were chosen at random from & set 
of tables. The frequencies of the digits were as follows— 

Digits :0 1 23456789 
Frequencies ? 18 19 23 21 16 25 22 20 21 15. Use the X*-test to 
assess the correctness of the hypothesis that the digits were 
distributed in equal nos. in the table from which these nos. were taken. 

ANSWERS 

(1) SNV ‘z'=4-5 (2) SNV ‘a’— 3 - 67 (3) SNV V=401 (4) SNV 
*z*= 1 *25 (5) SNV ‘z’=5-5 (6) [a]. SNVV=10-75 (7) Student 

V «0-266 (8) Student 7’ =2*688 (9) Paired V = 1-55 (10) Paired 
V— 2 99 (II) Paired 7=1-33 (12) [a]. Fisher ‘/'=0*54, [b].Paired 
7’=2G2 (13)Fisher */’=2 74 (14)Fisher 7’=244 (15)Paired 7’=263 
(16) Fisher 7’ =0-53 (17) [i]. Paired 7’=4 - 44 [iij. Fisher 7’=2‘61 
(18) fbl. (i) F=1'13. Paired 7=3-79, (ii) Fisher 7=5-5 (19) SNV 
•z'=l-68, Student 7’=3vl (20) F=215, Fisher 7’=l-5 (21) SNV 
< a’=2-05, Fisher 7’=2-05 (22) Fisher 7=5-6 (24) X 2 = 1045 

(25) X 2 =0 28 (26) X 2 =8‘97 (27) X 2 =0 12 (28) X 2 =4’286 (29) X 2 =948 
(30)X 2 =1"47 (31)X 2 =39"03 (32) [al X*=0’47 (33) [a]. X 2 T =r241 
X 2 d=0-634, X 2 h=0-607.X 2 (3)=0-938, (34) X 2 =22P52 X 2 J) =0-26, 
X 2 t =0 62, X 2 ,. =220 64 (35) X 2 =4'81, X 2 a=4-44, X 2 b =0018,X 2 l=0-09 
(36) X 2 =45 (37) X 2 =5 12, X 2 a=4-90, X 2 b =*004, X 2 l=0'18 
(38) X 2 =48‘473, X 2 v=0‘143 X 2 { g=0-073, X 2 L =48-257 (39) X 2 =?084 
(40) X 2 =4'31 (41) X a =0'253 (42) X 2 =3'84 (43) X 2 =10’989 

(44) X 2 =0-194 (45) X 2 =4767 (46) X 2 =4‘3. 



Chapter VI 

Analysis of Variance 

6.1 Introduction : Generally, the planning of an experiment for 
some aimed at purpose is found quite troublesome, because an 
experiment conducted for the desired character tests some other 
and not only this, but sometimes it does not serve any useful 
purpose if not properly designed. Thus one who deals with some 
applied research works must be familiar with the techniques of 
design of experiments. One such technique of analyzing the 
experimental data is the technique of analysis of variance. It tests 
the homogeneity of observations with regard to the character(s) 
under study i.e. the homogeneity of several means for the aimed 
character(s) through the F test(s). 

6.2 Meaning and definition : It is a well known fact that 
the variation is inherent in nature. Since we cannot find any two 
items exactly alike, so the variation is natural in the measurements 
of an experiment. The total variation present in the body of the data 
is generally due to several factors. These factors are of two types— 

(i) Assignable factor, and (ii) Chance factor. 

The assignable factor is one which is easlhj traceable, w hile 
the chance factor is the result of a large number of small independent 
causes which cannot be traced separately. This technique of analysis 
of variance in the first place estimate* the amount of variation due 
to each factor separately and t. en compares the estimate of the 
assignable factor with that of the chance factor. The estimate of 
the amount of variation due to the chance factor is called the 
experimental-error, or error. Thus the analysis of variance can be 
defined as the technique of partitoining the total variation present 
in the experimentally observed data into its component-variations 
due to the different-factors, and then comparing them. The technique 
was developed by prof. R. A. Fisher. 

6.3 Applications of analysis of variance : The analysis of 
variance is a powerful statistical tool* to test the homogeneity of 

* It can be used as a tool to test the equality or homoge neity 
of several sample-means and hence may be treated as the 
generalization of Fisher U' test used for testing the equality of two 
sample-means. 
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the observations i.e. to test thi hyp. (H q)— whether the observations 
in the data are drawn from the same normal population . Some other 
uses of this technique are made in testing the linearity of the fitted 
regression lin$ and the significance of t he correlation-ratio V . 

6.4 Assumptions for analysis of variance. 

(1) The observations are independent. 

(2) The parent- population is normal with unknown s.d V. 

(3) The treatment and environmental-effects are additive. 

(4) The different groups are homoscedastic. 

6.5 Procedure of analysis of variance • The procedure 
of analysis of variance with its three main sub-headings viz. one- 
way classification , two-way classification and three-way classification , 
is illustrated below in five main steps. 

(1) Set up the hyp. (H 0 )— that the data is homogeneous with 
regard to the factors of classification. 

(2) Arrange the data in a suitable tabular-form as required 
according to the number of factors. 

(3) Commute the total sum of squares deviated from mean and 
then partition it into its component-sum of squares separately. 

(4) Summarize the results in the analysis of variance table 
called ‘ANOVA\ 

(5) State briefly the conclusions derived ffbm ANOVA at 
some particular level of significance, say 5%. 

6.5.1 One- way classification : In this type of classification, 
the data is classified with respect to one factor only. The total 
observations V of the data may be classified into *k' different 
groups or classes each having ‘n<’ (/=1, 2, ...,k) observations with 
respect to some criterion of classification such that n=S < n i . For 
example, n cows of k- different breeds may be classified into 
k-classes such that n x of them may belong to the 1st breed, n 2 to 

2nd t , and n /c to the k-th breed. If y w (f-1,2, — ,k for breed no. , 

j c=l, 2, • ,n ifor cow no.) denotes the milk-yield of ]th cow of i-th 
breed, the milk-yields of all the cows can be arranged in the following 

In this table, T* 
stands for the total milk- 
yield of the cows of i-th 
breed, and G for the 
grand total of the milk- 
yields of all the cows 
such that G=SiT<=S^y^. 


tabular-form. 


Breed 

No. 

Cow No. | 

Totals 

=T< 

l 2 

... j 

.< 1 

1 

yu yw 

... y,j .. 

ini 

T x 

2 

yai yai 

... y 21 ... 

y ini 

T* 

i 

yu y<a • 

•• y« •• 

) ini 


k 

y« y*a 

... y w ... 

Ykn, 

f* 1 

Total | 

•• 

. ... 

— 

1 G 1 
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Here we take the hyp. (H 0 )-that the milk-yields are not affected 
by the breeds of the cows, or the breeds are not significantly diffe- 
rent, or the data arc homogeneous with regard to breeds. The factors 
or sources of variation are obviously— (i) breed, and (ii) error. The 
required sum of squares* are computed as shown below. 

(1) Total sum of squares=2«y 2 ,j— CF, where CF ( correction 

factor) =G 2 /n. 

i.e. TSS = S, say. 

(2) Sum of squares due to brceds=2/ry, (i — CF 

i.e. SS (breeds) = S„ say. 

(3) Sum of squares due to error =TSS— SS (breeds) 

i.e. SS (error) =S 2 , say. 

Now on the basis of the above mentioned hypothesis of homogeneity, 
the two independent estimates of the variance (<r 2 ) do not differ sig- 
nificantly. These estimates of variance are obtained by dividing the 
SS by their respective d.f. The significance of these estimates can be 
tested by using the ‘F’-test. If the two estimates differ significantly 
at a given level, the factor of classification is said to have an influ- 
ence on the variate-values. The results are summarized in the follo- 
wing analysis of variance table. 

ANOVA 


Source of 
variation 

DF 

SS 

MS 

F 

F.o 5 

Breed 

Error 

k— 1 =v L 
n— k— v 2 

St 

S 2 

S 1 /v 1 =V l 
SA=Ve . 

Vj/Ve 

— 

Totals 

ln-1 | S | | - 

r- 


Finally, if *F’ comes out to be significant that is, if F^F.^K,^), 
we reject the hypothesis at 5% level and hence conclude that the 
milk-yields are affected by the breeds of the cows. But in the contrary 
case, we say that there is no evidence against the hypothesis at 5% 
level of significance. 

Note : If in any case, the error MS i.e. the error variance is 
found to be greater than the variance of the factor, He need not to 
calculate the value of ‘ F’ corresponding to the factor. Because in 

* If the data are large numbers, the sun of squares maybe 
computed from the new figures of the data obtained as residuals 
rftex deviations from some convenient assumed mean, because ihe 
SST are unaffected by the change of origin. 






Analysis of Variance 


61 


such a situation we simply conclude that the factor of classification 
is not significant since 7 <1. 

6.5.2 Two-way classification : In this type of classification, the 
dita is classified with respect to two factors . The total observations 
*n* of the data may be classified into'k’ dilfercnt classes each having 
W observations with respect to some criterion of classification, and 
into 'm’ different classes each having <k’ observations with respect 
to some other criterion of classification such that n = mk. For exam- 
ple, n-cows oi ' k -differ cut breeds and m-different-lactation- periods 
may be classm >d into k-ela^ses each having m-cows of different lac- 
tation-period^, and also into m-clusses each hiving k-cows of diffe- 
rent breeds. If y tJ (/=l,2, ”,k for breed no. } / = 1 ,2, #, \m for lactation - 
- period no.) denotes the milk-yield of the cow of i-th breed with j-th 
lactation-period, then the milk-yields of all the cows can be arranged 
in the following tabular form. 


Breed 

No. 

|Laciation-period No. 

To;als 

— 

i 

2 .. 

j • 

.. m 

1 

yn 

y 13 .. 

■ y« - 

• y lm 

' T, 

2 

y« 

y 2 2 ••• 

■ y 2 y •• 

■ y 2 m 


i 

y<i 

y.a ••• 

y a ■■ 

• y.m 

T, 

k 

yfci 

Yfca - 

■ y k ) — 

V km | 

T* 

Totals jo 

=B, I 8 * 

b 2 

B, 


G 


In this table, T,- stands 
for the total milk-yield of 
the cow^ of i-th breed; B, 
for the total milk-yield of 
the cows of j-th lactation- 
period; and G for the grand 
total of the milk-yields of 
all the cows such that 


Here we take the hyp. (H 0 ) — that the milk-yields arc not affected , 
by the bneds and the lactation- periods of the cows. The sources of 
variation are obviously— ( 1 ) breed, (ii) lacia 1 ion-period, and (ni)Error. 


The required sum of squares are computed a> shown below — 

(1) TSS = ^t J ^ 2 i i — CF =S, say; where CF — G 2 /n. 

(2) SS (breeds)- T 2 ,/ m — CF =S lt say. 

(3) SS (lact. pds.) — E,B 2 ,/ fc — CF=S 2l say. 

(4) And SS (error) = TSS — SS (breed +lact.pd.)=S 3f say. 

Now on the basis of the above mentioned hyp. of homogeneity, 

the three independent estimates of the variance (a 2 ) do not differ 
significantly. These estimates of variance are obtained by dividing 
the sum of squares by their respective d.f. The significance of 
these estimates can be tested by using the 'F-test. If any estimate 
compared with that of the error differs significantly at a give level, 
the corresponding factor of classification is said to have an 
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influence on the variate-values. The results are summarized in the 
following analysis of variance table. 


ANOVA 


Source ot 1 
variation* 

SS | MS F 

F'o 5 

Breed 
Lact. pd. 
Error 

k— l=Vj 

m-l=v 2 
(k— l)(m— l)=v 3 

s, 

5 2 

5 3 

Si/v i =V 1 (Vj/Vp. 

S 2 /v 2 = Vjj |v 2 /v e 

S 3 /v 3 = V E i , 

... 

Totals 1 n-1 \S. 

— 1 \ 


Finally, if both or any one * F’ comes out to be significant, we 
reject the hyp. at 5% level corresponding to both or any one factor; 
otherwise wc conclude that there is no evidence against the hyp. at 
5 % level of significance. 

6 5.3 Three-way classification : In this type of classification, 
the diia is classified with respect to three factors. The total 
observations ‘n’ of the data may be classified into ‘k’ different 
classes with respect to some one criterion of classification, into 'm’ 
different classess with respect to some other criterion, and into ‘p’ 
different classes wi h respect to the remaining third criterion of 
classification. For example, n-cows of k -different breeds, m- 
differeit lactation-periods and p -different age-groups may be 
classified into k, ra, p classes respectively according to the said 
factors of classification. But here we shall consider the paiticular 
case where ihe no. of classes with respect to each of the three 
factors is the same i.e. k=m=p. This type of arrangement is done 
in Latin Squares, where f the rows denote the classes with respect to 
first factor, columns with respect to second factor and the latin - 
letters (‘enote the classes with respect to the third factor of 
classification. In a latin square, the observations are so arranged 
that each letter occurs only once in each row and each column. 
Also, a latin square with k rows and k-columns is called a 'kxk' 
Latin Square, or a Latin-Square of order ‘k’. If for the above said 
example, we take for convenience k=3, then the milk-yield a denoted 
by yif (/— 1 , 2 ...k f r breed no.j= 1 , 2 ,.. k, for lactation- period no.), 
along with a latin letter can be arranged in the following tabular-form. 
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In this table, Rf stands 
for the total milk-yield of i-th 
row corresponding to i-th breed; 
C, for the total milk-yield of 
j-th column corresponding to 
j*!h lactation period; T a for the 
total milk-yield of the cows of 
‘A’-age-group; and G for the 
grand total of the milk-yields of 
ail the cows such that 0 = 2 ^* 
n=k 2 . 

It is quite clear that the symbol 
(y a ) represents the milk-yield 

, lactation-period no.l and age- 
group^’. The similar meanings are attached with the remaining 
symbols used in the table. 

Here we take the hyp. (Hu ) — that the milk-yields are not 
affected by the breed"*, lactation-periods and the age-groups of the 
cows. The sources of variation are obviously— (i) breed, (ii) lactation 
period, (iii) age-group, and (iv) error. The required sum of squares 
are computed as shown below— 

( 1 ) TSS=Z 0 y 2 „ - CF=S, sav; where CF— G J /n. 

(2) SS (breeds, or rows) =S I R a t / fc — CF =S 19 say. 

(3) SS (lact.pds., or cols.)=£jC, 2 / fc — CF =S 2 , say. 

(4) SS (age grps., or letters) =S a T 0 2 / fc — CF=S 3 , say. 

(5) SS (error)=TSS-SS (rows+cols.+lt'tters)=S 4 , say. 

Now on the basis of the above mentioned hyp. of homogeneity, 

the four independent estimates of the variance (a 2 ) do not differ 
significantly. These estimates of variance are ob ained by dividing 
the sum of squares by their respective degrees of freedom. The 
significance of these estimates can be tested by using the 4 F ’-test. 
If any estimate compared with that of the error differs signifies ly 
at a given level, the corresponding factor of classification is said 
to have an influence on the variate-values. The results are 
summarized in the following analysis of variance table. 

ANOVA 


'ource of 
variation 

DF 

SS 

! MS 

F 

F-05 

Breed 

k— l=Vj 

S, 

S 1 /v 1 =Vj 

V,/V E 

... 

Lact. pd. 

k-l=v t 

S 8 

Sj/vj =V 2 

V^v E 

* » . 

Age. grp. 

k— 1= v, 

S 3 

S 3 /V 1 =v, 

V 3 /Ve 

* *. 

Error 

(k— l)(k— 2 )=v 2 

S 4 

s A =Ve 

— 

v" 

Totals | 

n — 1 

IS | 

1 — 1 

1 1 



Breed 

No. 

Lact. pd. no. 
and age grp. No. 

T 2 ~3 

Totals 

=R< 

1 

A 

B 

C 

R. 


yn 

yis 

y. 3 


2 


c 

A 

r 2 


y*i 

y 22 

V 93 


3 

c 

A 

B 

r 3 


>’ai 

y s * 

Yaa 


Tofals 

“C, 1 

c, 

c. 

c 3 

1 0 

\ge grp. 
Totals 

-T. 

Tr 

~~T» 




of the cow which is of breed no. 1 
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F ! nally,if one or more values of'F’ come out to be significant, 
we reject the hyp. at 5% level; otherwise we conclude that there 
is no ovidence against the hyp. at 5V 0 level of significance. 

Exp. (1) For the following dit , prepare the analysis of variance 
table and test the significance of the dilfcrence between the yields of 
the three varieties. 

variety yields in lbs. 

A : 10 12 8 10 

B : 13 10 11 

C : 9 12 13 It 10. 

Sol. (Ho) : The three varieties ‘ t.B and C are not significantly 
different as regards their yie/Jing-c ipacities. 

Ify«(*=1.2, 3 for variety no., j— 1, 2 n { for plot no.) 

denotes the yield (in lbs.) of j-th plot for i-th variety, the yields of 
all the plots can be arranged in the following(i>/i<?-wtfy classification ) 
tabular form. 

No we compute — 

CK=G : /n 
=(132 ) 2 / 1 2 
= 1452. 


TSS=S iJ y* 1J — CF =1488 — 1452 = 36, 

SS (varieties)=^jTi 2 /n < — CF 

= (40) 2 / 4 +(34)'-/ 3 _|-(58) 2 / 5 — 1452 = 1458’ 13 — 1452=6’ 13, 

SS (error)=TSS— SS (varieties)=36-6l3=29-8 7. 


ANOVA 


Source of 
variation 

DF 

SS 

MS 

F 

E-05 

Variety 

2 

0-13 

("3-005 

<1 

4 26 

Error 

9 

29-87 

1 3 32 

— 

— 

Totals | 

li |3tm> 

1 — i 

| — 

— 


Conclusion: Since 'F'<F.o, (2,9) showing its insigniicance, 
so there is no evidence against the hyp. and we conclude that the 
varieties do not differ significantly as regards their yielding cjptcities. 

Exp. (2) A certain company had four salesmean, A, B, C and 
D each of whom was sent for a week into three types of areas— 
country area ‘k\ outskirts of city ‘O’ and shoping centre of a 
city ‘S’. The sales in pounds j>er week are shown below. 

Area Sales in pounds I week 



A 

B 

C 

D 

K : 

30 

70 

30 

30 

O : 

80 

60 

40 

70 

S : 

100 

60 

80 

8C. Garry out an analysis ot 


Variety 

PLOT No. ! 

Totals 

No. j 

1 2 3 4 5 

=T, 

1. A 

10 12 8 10 — 

40 

2 B 

13 10 11 

34 

3 C 

9 12 13 14 10 

1 68 


t 
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variance and interpret the result stating the limitations under 
which your conclusions are valid ? (ICAR, Delhi 1966) 

Sol. (Ho) : There is no significant difference between the sales 
of the four salesmen and the three types of areas. 

Since the figures of the data are large numbers, so we can 
use the deviations from some convenient assumed mean, say 
y=60. If y<j (i=l, 2,3 for area no., j =1,2, 3, 4 for selesman no.) 
denotes the sale (in pounds/week) of jth salesman fo ith area, then 
the sales of aH the places can be arranged in the following (two-way 
classification) tabulir form. 


Now we compute— 
CF=G 3 /n 
=(o) 2 /12 
=o. 


SS (areas) ^SJVm-CF= ( — — f(8Q)2 -0^3200 

SS (salesmen)— S J BVfc—CF= ( — 8 1 (0 > 2 +(- ftLVtlgg -0=600, 

SS (error) = TSS— SS (areas+salesmen)«=2400. 

ANOVA 


Source of 
variation 

DF 

SS 

MS 

F 

P-05 

Area 

2 

3200/1600 

1600/400=4 

5*14 

Salesman 

3 

600 200 

200/400 < 1 

894 

Error 

6 

2400] 

400| 

— 

— 

Totals 

|H 

|6200| — 

1 

Id 


Conclusion : The insignificant values of F s show that there 
is no evidence against the hyp. at 5 5 / 0 level, and thus we conclude 
that there is no significant difference between the sales of the four 
salesmen and three types of areas. 

Exp. (3) In an experiment on the spacing of millet, four 
spacings were used— A=2",B=4*,C— 6* and D=8*,and yields were 
arranged in a latin square. The experimental arrangement with 
yields in grams/plot is shown below— 


Area 

SALESMAN No. 

Ttals 

No. 

l.A 

2.B 

3.C 

4D 

=T( 

l.K 

-30 

' 10 

-30 ~ 

-30 

-80 

2.0 

*•20 

— 10 

-20 

10 

0 

3.S ; 

40 

0 

20 

20 

I 80 

Total | 

30 

0 

-30 

' 0 

1 0=G 


tad 

TSS=£ i 2,y 3 i ,-CF=6200-0 =6200, 
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B D A C 

249 245 249 244 

A C D B 

264 249 240 252 

D B C A 

245 254 250 267 

C A B D 

251 261 254 246. Construct an analysis of 

variance and test for the variation between the spacings? 

Sol. (H 0 ) : There is no significant difference between the yields 
of the plots due to four rows, columns and spacings. 

Since the figures of the data are large numbers, so we can 
use the deviations from some convenient assumed mean, say y=250. 
If y« (i=l, 2, 3, 4 for row no , j=l, 2, 3, 4 for col. no.) denotes 
the yield (in gms/plot) of j-/A column for i-th row, the letters A, B, 
C and D stand for spacings, then the yields of all the plots can be 
arranged in the following ( three-way classification ) tabular form. 

Now we compute— 
CF=G 2 /„=(0)*/16=0 
TSS=S,S,y 2 «-CF=428 
SS (rows)=S j RV ft — CF 
=935 

SS (cols.)=2,Cy*-CF 

=330 

SS (spacings) =2 0 T 2 0 / fc 

— CF=283'5, 
SS (error) = TSS — SS 
(rows+cols.+ spacings) 
-=428— (93-6+330+ 
283-5)=18. 
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between the yields of the plots due to the row-means and the 
spacing-means. 

Exercise VI 

1. To test the significance of variation of the retail prices 
of a certain commodity in the four principal cities-Bombay, Calcutta* 

Madras and Delhi, seven shops were chosen at random in each city 
and the prices observed were as follows— 

City Prices in np. at the shops 

Bombay : 82 79 13 69 69 G3 61 

Calcutta : 84 82 80 79 76 68 62 

Madras : 79 77 76 74 72 68 64 

Delhi : 88 84 80 68 68 66 66. Do the data indicate 

that the prices in the four cities are significantly different ? Tabulate 
the results properly for the study of variation both between cities 
and within cities ? 

2. The plants of wheat of four varieties were selected 

at random and the heights of their shoots were measured in cms.— 
Variety Heights in cms. 


l : 

40 

42 

41 

43 

45 

2 : 

42 

41 

44 

45 

43 

3 : 

39 

42 

43 

40 

44 

4 : 

37 

34 

38 

35 

37. Do the data indicate that there is 


no significant difference between the mean heights of the plants of 
the four varieties ? 

3. Four varieties of potato are planted, each on five plots 
of ground of the same size and type; and each variety is treated 
with five dilferent fertilizers. The yields in tons/plot are as follows — 

Fertilizers 

variety 12 3 4 6 

1 : 1 '9 2-2 2*6 1*8 21 

2:2-6 1 9 2-3 2’6 2'2 

3 : 1-7 1-9 2-2 20 21 

4 : 21 1-8 2 6 2’3 2 4. Perform an analysis of variance 

and show whether there is any significant difference between the 
yields of four varieties or due to five fertilizers. 

4. (a] Define " analysis of variance”. Give its assumptions 
and also the applications ? 

[b] Six varieties of wheat were tested in the four blocks of a 
field. The yields in kgms/plot and the layout are given below. 
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Block 

I : 

II : 
III: 

IV: 
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Layout and yieldsjplot 


W t V, V a V 4 

17-8 17-7 20-6 6'2 

V s V 2 V 4 V 4 

12-7 18-8 17*3 6-0 

V # V 4 V, V a 

163 9-6 28-6 268 

V 5 V 3 V 4 V. 

7-7 210 186 4'1 


V 5 

6-2 

V 9 

12*5 

Vs 

29-5 

V 3 

24 9 


V 4 

14‘9 

V* 

7-0 

Vs 

64 

V 6 

126. Prepare 
the varieties and 


the 

the 


analysis of variance table and test whether 
blocks differ significantly in regard to their average-yields 7 

5 You are given the results of the following cacao 
Manorial experiment conducted in a 3 x3 latin square .The fertilizers 
used were— O^-no manure (control), A->one lb. of super phosp a e/ 
plant and B ->two lbs. of super phosphate/plant. The layout an 


yields in Ibs./plant from-jy acre plots are gi 

0 

A 

B 

14 

40 

24 

B 

0 

A 

23 

19 

31 

A 

B 

O 

20 

21 

* 11. Carry out the 

test the variation between the fertilizers. 


four varieties of gram tested in a 4x4 latin square. 


A 

B 

C 

D 

105 

112 

1131 

108 

B 

D 

A 

C 

113-6 

110-6 

114 

112 

C 

A 

D 

B 

114 

108-5 

110 

113 

D 

C 

B 

A 

107-5 

116-6 

111-1 

111-6. 


Carry out the analysis of 

variance and test the homogeneity of the given data 7 

ANSWERS 

(1) V 4 = 31-67, Ve*» 60-25 (2) F= 13 98 (3) V 4 - 0 0963, 
V t =0-1166,V E -=0-066'8 (4) [b].F, -21*98, F*= 6-77 (6) F 1 =10 7, 
Ff =6-9, F.-4-15 (6) Vr= 6 09, Vc=313, Vi»1918, V E = 


=5*68. 
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Correlation And Regression 


71 Meanings and definition of correlation : A number 
of statistical problems arise in exact and social sciences in which 
the sample drawn from a bivariate normal population consists 
of pairs of measurements (x, v). In all such problems, these two 
variables (x, y) are found to behave in such a manner that the 
change in one brings the change in the other. This type of 
phenomenon or relationship betweeu any two variables is called 


correlation and such variables are said to be correlated. Thus the 
correlation may be defined as the relationship between the two 
variables when the change in one variable is on an average 
accompanied by the change in the other variable in the same or 
opposite directions. For example, the two variablcs-vo/wme and 
pressure of a perfect gas at some constant temperature, according 
to Boyles' Law, are so related that the volume increases with the 
decrease in pressure or vice versa, and hence these two variables 
are said to have the correlation. 

7.1.1 Direction of correlation : By direction of correlation 
we mean the sign of correlation. On the basis of the direction, the 
correlation is of the following two types. 

(i) Positive correlation : f*** * * * V 

T , * .. u # . I Direction of correlation 

The correlation between the two f ^ 

( ) Positive correlation. j 


variables is said to be(+)ve if 
the changes in both the variables 


(ii) Negative correlation. 


are observed in the same direction. It clearly means that either 
both the variables increase or decrease simultaneously. This type 
of correlation is found between— the total cultivable area and 
the area under wheat; the amount of production of a crop and the 
amount of fertilizer applied to it; the demand and the price of a 
Certain commodity; the age and the height of a child; the radius 
and the circumference or area of a circle; the volume and the 
temperature of a perfect gas at some constant pressure according 
to Charle’s law etc. 

(ii) Negative correlation : The correlation between the 
two variables is said to be (— )ve if the changes in the two variables 
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are observed in opposite directions. It clearly means that if one 
variables is increasing, the other is decreasing, and vice versa 
This type of correlation is found between-the areas under fodder 
and grain-crops; the supply and the price of a certain commodity, 
the total income and the proportion of it spent on food; the volume 
and the pressure of a perfect gas at some constant temperature 
accoiding to Boyle's Law etc. 

7.1 2 Degree of correlation : By degree of, correlation 
we mean the magnitude or extent of correlation. On the basis of 
the degree i.e. the rato of changes in the two variables; the 
correlation is of the following two typ s. 


1 Degree of correlation (i) Perfect correlation : The 

* (i) Perfect correlation. correlation between the two 

♦ (ii) Limited correlation. variables is said to be perfect if 

fji the ratio of changes in both the 

variables remains the same throng lout. It clearly means that the 
percen age-change in one variable is accompanied by the same 
percentage-change in the other variable in the same or opposite 
directions. This type of correlation is found between-the radius 
and the circumference of a circle; the volume and the pressure of 
a perfect gas at some constant temperature etc. 

(ii) Limited correlation : The correlation between the two 
variables is said to be limited if the ratio of changes in the two 
variables does not remain'the same throughout. It clearly means 
that the percentage-chang; in one variable is not accompanied by 
the same percentage-change in the other variable in the same or 
opposite directions. This t/pj of correlation is found between-the 
demand and th: price of a certain commodity; the areas under 
fodder and grain crops etc. 


7.1.3 Degree and direction of correlation : By degree and direction 
oi correlation we mean boih-the magnitude (or extent) and the 
dgn of correlation. On the basis of the degree as well as the direction 
the correlation is of the following four types. ' 

(i) Perfect positive correlation: 

The correlation between the two 
variables is said to be peifect 
positive if the direction and the 
ratio of changes in both the 
variables remain the same 
throughout. This type of 


Degree and direction cf 
correlation 

(i) Perfect positive correlation 

(ii) Liroited positive correlation 

(iii) Perfect negative correla ion 

(iv) Ifimited negative correlation 
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correlation is found between-the volume and the temperature of a 
perfect gas at some constant pressure. 

(ii) Limited positive correlation : The correlation between 
the two variables is said to be limited posiiive if the direction of 
changes in the two variables remains the same throughout but the 
ratio of changes differs. This type of correlation is found between 
the total cultivable area and the area under wheat. 

(iii) Perfect negative correlation : The correlation between 
the two variables is said to be pe feet negative if the direction of 
changes in the two variables differs but the ratio of changes 
remains the same throughout. This type of correlation is found 
between— the volume and the pressure of a perfect gas at some 
constant temperature. 

(iv) Limited negative correlation : The correlation between 
the two variables is said to be limited negative if both-the direction 
and the ratio of changes in the two variables differ throughout. 
This type of correlation is found be ween— the areas under fodder 
and grain-crops. 


7.1.4 No correlation : The two variables are said to have 
no correlation(or uncorrelated)if the change in one variable does rot 
affect the other variable. For example, the no.of radio sets produced 
and the total no. of births recorded during a certain period have 
no relationship with each other, and hence the two have no 
correlation. 


7.2 Measures of correlation : The direction and the 
magnitude of correlation between the two variables can be found 


by using either one of the 
mathematical methods or a 
diagrammatic one. 

7.2.1 Mathematical methods. 

7.2.1 (a) Pearson's coefficient 
of correlation : This measure 
of correlation obtained by prof, 
Karl Pearson is based on the 


Mathematical methods 

(a) Pearson's coefficient of 
correlation. 

(b) Spearman's „ „ „. 

(c) Coefficient of concurrent 
deviations. 

(d) Coefficient of correlation 
by least-squares. 


aritbmeticaldescriptions.lt is usually denoted by the symbolV. 
This measure is quite capable of expressing the direction and the 
exact amount of casual relationship between the two variables 
under study. It has been seen that the values of V always lie 
between— 1 and+1. Further we note that for a value r= + l, there 
is perfect (+) ve correlation; for r— — 1, there is perfect (-)ve 
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correlation; and for r =o, there is no correlation between the two 
variables. Also for a value of r lying between o and 1, there is 
limited I +) ve correlation while for r lying between o and— 1, there 
is limited (— )ve correlation between the two variables. If we 
consider a random sample of n pairs in (x, y) drawn from a bi- 
variate n >rmal population, then Pearson's co fficient of correlation 
or the product moment co'fficient of correlation is given by the 
formula — 

r*= — ° v — - , where cov(.v, y)=S (x—x) y -y)/n stands 

CTx* Gy 

for the sample-covariance of x and y, a x = y 1 [S(r— x) 2 /n] and 
a v—V^(y — .v) 2 / q ] stand for satnple-s.ds. of.v and?/ respectively. 
Thus by using t'is direct method, we have 
S(s-.y )(?/— y) . 

1 y/mx-xmy-m 

For the sake of convenience in computations especially when 
the actual means x, y come out to be in decimals, we can also use 
a short cut method to compute V as 

— E!;SY]/n . . . 

r= v[{S^-TslWnKSt)MsWiF’ where ** stand fordcViat,ons 

or step-deviations of the variates x and y from their respective 
assumed means As- andA t . This method is very common in practice 
and it consists of the following uain steps. 

(i) Prepare a blank table with eight columns in it. 

(ii) Use I col. for pair no. ; Il-for x; II I— for the deviation 
5=(jt-A a )or the step-deviation ;=(x-A»)// if / be the common 
factor; IV-for i[ a ; V- for y, Vf-for the deviation v)=(y— A„) or the 
step- deviation >j=(y— A »)// if / be the common factor; V[[-for i) 2 
and VHI-for the product 

(iii) Complete the entries of all the eight columns against each 
pair number. 

(iv) In the bottom of the table, get the totals of all the cols, 
except the col. no. I, II, V, and substitute them in the formula to 
compute the value of V. 

7.2.1 (a-1) Assumptions for pearson’s coefficient of correlation : 

(i) The two variables are linearly related. 

(it) The variables are affected by a no. of independent causes. 

(iii) The independent Causes, affecting the variables, have 
some inter-relationship between the mutual causes and effects. 

(iv) The variables are random. 
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(a — 2) Properties of Pearson’s coefficient of correlation : 

(i) It gives us the direction a9 well as the magnitude of correl- 
ation between the two variables. 

(ii) Its value always lies between -I and-f 1. 

(iii) It is independent of the change of origin and scale. 

(iv) It is symmetrical in the two variables, i.e. t 0 i/ =t v b. 

(v) It is rigidly defined and hence free from human bias. 

(vi) It depends upon all the observations of the data. 

(vii) It is a pure number and hence free from any of the Uuits. 
(viii) It bears the same sign as that of cov (x,y). 

(a— 3) Limitations of Pearson’s coefficient of correlation : 

(i) It is difficult to compute. 

(ii) Its meanings are not easily understood. 

(iii) It does not show whether there is any relation between 
the causes and effects producing the correlation. 

(a— 4) Applications of Pearson' s coefficient of correlation : 

(i) It can be used to find the relation between the two variables. 

(ii) It can be used to determine the two regression coefficients, 
the angle between the two regression lines, the standard errors of 
estimates, and the covariance between the two variables provided 
the s.ds. of the variables are known. 

(iii) It can also be used as a measure of linearity between the 
two variables. 

Ezp.(l) Calculate the coefficient of correlation for the 
following ages of husband and wife — # 

Husband's age (* yrs.)...23 27 28 29 30 31 33 35 36 39 

Wife’s age ( y yrs.)"’l< 22 23 24 25 26 28 29 30 32. 

Sol. The following table shows the calculation ofr. 


Pair | 

X 1 

$=x-A* 

¥ 

y 

»/=y-A» 

r 


No. | 

1 

Aa = 3l 



A, ,=25 



I 

23i 

-8 

64 

18 

—7 

49 

56 

2 

m 

-4 

16 

22 

-3 

9 

12 

3 

28 

-3 

9 

23 

—2 

4 

6 

4 

29 

-2 

4 

24 

—I 

1 

2 

5 

m 

-1 

i 

2 s 



0 

6 

3 



26 

I 

1 

mm\ 

7 

33 

2 

4 

28 

3 

9 

6 

8 

.35 

4 

16 

29 

4 

16 

16 

9 

36 

5 

26 

m 

5 

25 

25 


|39 

8 

64 

132 

7 

49 

56 

jTotals 

h 

1=S£ 

203 

-se 

l 

7=2i 

EH 

■K‘l 

53s 1 
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Now we have — 


S&j-Letyn 

■ V[{2£MWn}{2v)M2v))7n}] 

179-1x7/10 _ 179— *7 

V[{203-(l) 2 /10}{163-(7)-/10}] V L{203 — ' I >{ 1 63— 4-9} f 


Thus there is (+) ve correlation of high degree between the 
ages of husband and wife.”' Ans. 


(b) Spearman’s coefficient of correlation : This measure of 
correlation, obtained by Spearman Brown, is also based on the 
arithmetical descriptions. It gives the correlation between the rank v 
(or grades) assigned to the two characters under study and hence 
may also be called as the rank correlati >n coefficient. It is com- 
puted in the situations when the numerical measurements on the 
characters are difficult but their grading is easy with regard to some 
criterion; or when we want to know the relationship between the 
proficiencies of a group of candidates in the two different subjects, 
or when we want to investigate the degree of agreement between 
the two judges who have graded the same individuals regarding the 
same characteristic. 

It is also denoted by the symbol r. This measure is also 
capable of expressing the direction and the exact amount of casual 
relationship between the two characters under study. Its value also 
lies between — 1 and + 1 . If we consider a random sample of n pairs 
in (x,y) drawn from a bivariate normal population, then Spearman’s 
coefficient of correlation is given by the formula— 

r=l - . >» where d stands for the difference between the 

n(nM) 

two ranks of the same individual. Here it is assumed that no two 
or more individuals receive the same rank for a character. If two or 
more individuals possess the same magnitude for a character, they 
receive the average rank determined on the basis as if their magni* 
tudes are slightly different from each other. An individual possess- 
ing the highest magnitude of a character is usually assigned rank 

1; the next lower rank 2; and so on, , the individual with lowest 

magnitude is assigned the highest rank for the character which is 
the same as n, the no. of pairs in the sample. This method of finding 
the rank correlation coefficient between the two characters consists 
of the following main steps — 

(i) Prepare a blank table with seven columns in it. 
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(ii) Use I-col. for pair no., Il-for x, lll-for rank of x; IV-for 
y; V-for rank of y; Vl-for rank difference ‘d’; and VII-for d 2 .* 

(iii) Complete the entries of all the seven columns against 
each pair number. 

(iv) In the bottom of the table, get the total of the last col. 
only and substitute it in the formula to compute the value of r. 

Note : IJ in a problem, the direct ranks are given for the 
individuals of the sample, then we need not to include the cols. II, 
IV in the above table. The assumptions, properties, limitations and 
the applications of the rank correlation coefficient are the same as 
those stated for Pearson’s coefficient of correlation. 

Exp (2) Ten students got the following percentage of marks 
in Economics and Statistics — 

Student — 1 2 3 4 5 6 7 8 9 10 

Marks in Eco. - 78 36 98 25 75 82 90 62 65 39 

Marks in Stat 84 51 82 45 82 62 82 58 63 47. Calculate the 

coellicient of rank correlation. 


Sol. The computation of rank correlation coefficient is shown 
in the following table: 


Student 

No. 

Marks in 
Eco. 

.v 1 

Rank ] 
of 

x 1 

Marks in 
Stat. 

V 

Rank 

of 

y 

Rank 

diff. 

d 

d 2 

1 

78 

4 

84 

1 

3 

9 

2 

36 

9 

51 

8 

1 

1 

3 

98 

1 

82 

3 

-2 

4 

4 

25 

10 

45 

10 

0 

0 

5 

75 

5 

82 

3 

2 

4 

6 

82 

3 

62 

5 

—2 

4 

7 

90 

2 

82 

3 * 

-1 

1 

8 

62 

7 

58 

6 1 

1 

: l 

9 

65 

6 

53 

7 

-1 

i i 

10 

39 

8 

47 

9 

—1 

1 i 


Now we have 
6£d 2 

r “ 1 n(n 2 — 1) 



6x26 

10(100-1) 


156 

990 


= 1— T6=+-84 


Thus there is high (+)ve correlation between the marks 


obtained in Economics and Statistics. Here we may also conclude 
that the students who are good in Eco. are also good in Statistics. 

•••Ans. 
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7.2.2 Diagrammatic methods : 

' (a) Scatter diagram : This measure of correlation does not 
require any arithmetical descriptions but is simply based on the nature 
of the diagram obtained oa a graph paper. If we consider a ran- 
dom sample of n pairs in (x ,y) drawn from a bivariate normal popu- 
lation and plot these point s on a graph paper , then the diagram of 
dots ( or points ) thus obtained is called scatter diagram . 


Diagrammatic methods 

(a) Scatter diagram. 

(b) Regression lines. 

(c) Simple graphs. 


Merely an eye-inspection of this 
diagran is sufficient to decide 
the degree and direction of corr- 
elation between the two variables 


(1) if all the points on a scatter diagram lie on a straight 
line with(+)ve slope, the correlation will be perfect (+) ve 
[see fig. (1)]. 

(2) If the points on a scatter diagram seem to cluster about 
the diagonal with (-f)ve slope, the correlation will be limited (+)ve 
[see fig. (2) ]. 

(3) If all the points on a scatter diagram lie on a straight line 
with (— ) ve slope, the correlation will be perfect ( — )ve [see fig. (3)J. 

(4) If the points on a scatter diagram seem to cluster about 
the diagonal with (— )ve slope, the correlation will be limited ( — )ve 
[see fig* (4)]. 

(5) If the points on a scatter diagram take roughly the 
elliptical or circular shape, then the two variables are said to be 
uncorrelated [see fig. (5) 
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fa — 1) Properties of scatter diagram : 

(i) It gives us the direction as well as the degree of correlation 
between the two variables. 

(ii) It depends upon all the observations of the data. 

(iii) It is simply a graph and hence free from any arithmetical 
descriptions. 

(iv) It is a very quick measure of correlation. 

(v) It is easy to understand. 

(a — 2) Limitations of scatter diagram : 

(i) It does not give us the exact amount of correlation but 
merely the degree of correlation i.e. perfect or limited. 

(ii) It is not rigidly defined but depends upon the accuracy 
and skill of the observer and hence not free from human bias. 

(iii) It can be used only for elementary purposes where we 
need simply a rough idea of correlation between the two variables. 

(iv) It does not show whether there is any relation between 
the causes and effects producing the correlation. 

(v) It is unable to give us any idea of correlation for a small 
sample because no specific diagram can be obtained from these 
few points. 

(a — 3) Applications of scatter diagram : 

(i) It can be used to find the relation between the two 

variables. 

(ii) It can be used as a pictogram for advertisement purposes 
to illustrate the relationship between the two characters. 

(iii) It can be used as a quick measure^ of correlation in all 
those situations wherein we require simply the ipproximate idea of 
correlation. 

Note : The assumptions fo’ a scatter diagram are the same 
as described for Pearson’s coefficient of correlation. 

7.2.2(b) Regression lines : The coefficient of correlation between 
the two variables can also be found with the help of the two 
regression lines. Let us consider a random sample of n pairs in 
(x,y) drawn from a bivariate normal population and plot these 
points on a graph paper to have the scatter diagram. If there exists 
association or relationship between the two variables x and y, the 
dots of the scatter diagram may be more or less concentrated 
around a line called the line of regression and the relationship thus 
exhibited is called the linear regression. Thus a regression line 
may also be termed as a regression function of order one. More * 
precisely, a line of regression is the straight line whichgives th e 
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best Jit in the least square sense , to the given frequency distribution . 
If a straight line is so chosen that the sura of squares of deviations 
parallel to the axis of x is minimum, it is called the tine o regression 
of x on y and it gives the best estimate of x for a given value of y . 
Similarly, if the sum of squares of deviations paraFel to the axis 
of y is minimized, the resuliing straight line is called the line of 
regression of y on x and it gives the best estimate of y for a given 
value of a. Usually, these two regression lines are different. 

Thus the regression concet s concerned with the way in 
which the changes in one variable depend upon the simultaneous 
changes in the other variable. Out of the two random variables 
used for correlation theory, here we choose one as the independent 
variable vhile the other as dependent. The variable whose effect 
produces the correlation is treatcl as independent and the other 
as dependent. For example, the correlation between the amount of 
rairfull and yield of rice is produced due to the influence of ra nfall 
on the yield of rice and hence the former is treated as the indepen- 
dent variable, the later as dependent. Sometimes both the variables 
may take both of the roles. It happens when the correlation between 
them is the result of the influence of a third factor. For example, 
the (-F)ve correlation between the yields of rice and jute is due 
to the fact that the two are related to the amount of rainfall. Hence 
we see that in a linear regression there may be at the most two 
regression lines as 

(i) y-y^byxix— a), and (ii) x-x^bxj (y— y). 

In the above relations, x i) represents the equation of regression 
line of y on a- which can give the estimated (or average, or expected 
or most likely) value of the dependent variable y for a given value 
of the independent variable y. Similarly, (ii) represents the equation 
of regression line of a on y which can give the best estimated value 
of a for a given value of y. Hence x,y denote the sample means for 
the variables a, y respectively and b^, b»^ stand for the respective 
regression coefficients of y on a and of a on y. The regression 
coefficient b yx gives the average change in y for a unit change in x . 
In the same way, the regression coefficient b«„ can be defined as 
the average change in a for a unit change in y. These coefficients 
of regression can be determined by any of the following formulae — 


(iii) by 


(ii) b,«= 

Ox 


' - h 


2(x — x ) 2 


and (iv) b v «= 


S5*-(S$)’/n 
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where x,y stand for the sample means; <j x > a v for the sample 
s.ds., and 1 -,y) for the deviations of the variables x and y form their 
respective' assumed means A® and A v . similarly, we may have 


(0 b 


<ty ~ 


(iii) bu y = 


cov ( x,y ) 

fT 2 » 

S (.c-X) (y-y) 


(ii) b®„ = — , 

Gy 

and (i v) b«j, = 




X{y-y ) 2 9 v ' 

For computing a regression coefficient by the last formula, 
the procedure consists of the same steps as stated for Pearson's 
coefficient of correlation. 

The direction and the magnitude of correlation between the 
two variables can be determined with the help of the two regression 
lines by usir g either (a) the two regression coefficients, ox (h) the 
angle between the two regression line*. 

7.2.2 (b-a) Determination of r from two regression coefficients : 

Given the two regression coefficient, thes correlation coefficient can 
be determined as the geometric mean of the regression coefficients. 
The sign of r is the same as that of either of the two regression coe- 
fficients because b v w t b 9yi r and cov (x,y) bear the same sign since 
Ga, G y are always (+) ve. 


Thus we have b^xhn, =— X— =r 2 «l) 

Gx Gy 

or i(byfl) b aJv ) 1 /- = r. 

(b.a-1) Properties of regression coefficients : 

(i) They give us the estima'es of changes of the dependent 
variables corresponding to unit changes in the independent variables- 

(ii) One of the two regression coefficients may be less than 
unify and the other greater than unity, but their product must 
never exceed unity. 

(iii) They are independent of the change of origin, bu )t 
of scale. 


(iv) They are not symmetrical in the two variables, i.e. 
bnv in general. 

(v) They are rigidly defined and hence free from human bias. 

(vi) They depend npon all the ob ervations of the data. 

(vii) They are the pure numbers and hence free from any of 
the units. 

(viii) They bear the same sign as that of the correlation, or 
the covariance term. 

(b.a-2) Applications of regression coefficients : 4 

(i) A regression coefficient can be used to find the amount of 
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correlation between the two variables x and y provided the sample 
.( s o» are known. 

(ii) A regression coefficient bj,« (or b«„) can also be used to 
find the covariance between the two variables x and y provided the 
am s d. r x (or a v ) is known. 

(iii) The two regression coefficients together can be used to 
determine the coefficient of correlation, the slopes of the regression 
lines with the coordinate axes, the angle between the two'regression 
lines, the standard errors of estimates and the covariance between 
the two variables provided the sample s.ds. a*, a„ are known. 

(iv) Their product (=r 2 ) can be used as a measure of linearity 
between the two variables. 

(v) They are also used to determine the two regression equa- 
tions (or functions) provided the sample means x, y are known. 

Note : The assumptions and the limitations of the regression 
coefficients are the same as staled for Pearson's coefficient of corre- 
lation. 

7.2.2 (b-b) Determination of r from the angle between the two 
regression lines : If the angle between the two regression lines 
ploted on a graph is ■ />, its magni- tude can be used to determine 
theamountof correlation between the two variables under study. 

If ^=0, the correlation is perfect (+ )ve; if o<|<90% the 
conelation is limited (+)vp; if <|i=90°, there is no correlation at all, 
if 9O°<0<18O°, the correlation is limited(— )ve; and if ^=180°, the 
coirelation is perfect (— )ve. 

(b.b-1) Plotting (he two regression lines ton a graph : 
Although for plotting the two lines on a graph we need at least four 
points, but here only three points are sufficient since a point ME 
(x,y) is common to both the regression lines being their point of 
intersection. Thus we need to plot 
the other two points, say A and 
B. If we put *=0 in the equation 
of y on x and find the correspon- 
ding value of y, say y„ then we 
obtain the point A = (0,y 4 ). If we 
join M and A, we get the line of 

regression of y on x. Similarly, 

b,v putting y=0 in the equation of * on y and finding the corres- 
ponding value of x, say x„ tain the point B = (*„0). If we 
join M and B, we get the line of regression of x on y. 
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If these two regression lines coincide, the correlation is perfect; 
if they cut at rightangle, the correlation is zero; and if they diverge 
and interseot each other, the correlation is limited. As the diver 
gence between the two regression lines increases, the correlation 
decreases. 

Exp.(3) A sample of paired variates is given below— 

x ; 1 2 3 4 5 0 7 

y : 5 13 16 23 33 38 40. 

(a) Compute the two regression lines and estimate x for y= 20. 

(b) Represent the data in a scatter diagra n and comment on 
correlaiio i. 

(c) Plot the two regress on lines on a graph and give an idea 
of correlation from the angle between the lines. 

Sol. (a) The following table shows the calculations of two 

Now we have 

Sgb ,/ n 

_ 172=0- ,. u 
28-0 614 
u _£fr]-Z5&i/n 

172-0 

1087— (—7) 2 /7 
=0-16 

A1 o, x=i+ ^=4+0=4, and y=25+p= t 25+^=2i. 

Thus the regression line of y on x is 
y— y=by 0 (x-x), or i/— 24=6*14(x — 4),/.e. y=6T4x-0 , 56— ■ Ant. 

Similarly, the line of regression of x on y is 

X—X=bxv{y~y), or x — 4=0*16(y — 24), j.ex=0 , 16 y+0T6 Ant. 

and estimate of x for y=20 is * d =0'16 x20+016=3‘36 Ant. 

(b) If we measure the variate x along the axis of x,and y along 
the axis of y.then we get the scatter diagram with seven points[(l,5); 
(2,13); (3,16); (4,23); (5,33); (6,38); (7,40)] on it. The points seem to 
be clustered about a line starting from the lower left hand corner 


regiession coefficients. 


Pair 

No. 1 

X 

l^x-4 

5° 

y 

1=0- 25 

Y) 2 


1 

1 

-3 

9 

5 

-20 

400 

60 

2 

2 

- 2 

4 

13 

—12 

144 

24 

3 

3 

- 1 

1 

16 

- 9 

81 

9 

4 

4; 

0 

0 

23 

- 2 

4 

0 

5 

5 i 

1 

1 

33 

8 

64 

8 

6 

6 

7 

4 

38 

13 

169 

26 

7 

7 

3 

9 

40 

15 

225 ' 

45 

Totals | 


0 (28 

— 

1 — 7 

|1087]t72 1 
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and going to the upper right 
hand corner (i.e. a line with 
positive slope), hence the 
correlation between x and y is 
limited positive. Since the 
points are very close to the 
diagonal with positive slope, 
so the correlation is of high 
degree and has a tendency of 
approaching the perfect value. 


■■■■■■■■■■■■■ 

■■■■MnrrrrrTTirrrvsrrrn 

liHHBKSSSSKSSSiisi 

iigiiliiiinaiimiii 
mmmmmmmmmmmmmmmmmmmmm 

■■■■■*■■■ ■■■■■■■■■■■II 

\m ■■■■■! L ■' 

■■■■■■■■■■fl 

I'lidlllllllHHiM 

Mtmmmmmmmmm 
MummMKmmmmm 
■!!■■■■■■■■■■■■■■■■■ 
■iih ■■■■■■■■■>■■«■ vr nil 

■■^(■■■■■■■■■■■■■ffSpa 


(c) We have the two 
regression lines as 

y=6'14x— 0*56 (i) 

and jc=O*l0y+O*10 (ii)» 

from (a). 

If we put *=o in (i), 
we get A 2 (o,— ’66), and 
on putting y=o in (ii), 
we get B = (0*10,0). The 
common point of these 
equations is M E ( x,y ) = 
(4,24). Thus on joining 
the points A and M, 
we get the line of y on x. 
Similarly, the line of x 



only can be drawn by joining the points B and M. Since the angle 
between these two lines of regression plotted on the graph is an 


acute angle (^) with a very low magnitude, so the correlation is 
limited positive. As the lines tend to coincide, so the correlation 


tends to unity. 


7.3 Testing the significance of difference between correlation 
eofficients : It consists of dividing the difference between the two 
correlation coefficients by the standard error of the difference and 
. then finding the probability of observing this ratio. The comput- 


Correlation and Regression 


83 


ations of the s.e. and the above mentioned probability depend' upon 
the following two situations— 

(a) When the correlation coefficient in the population is zero 
,/<?. P—0. 

(b) When the correlation coefficient in the population is not 
zero, i.e. P^O. 

7.3 (a) Testing the significance of an observed correlation 
coefficient r when P=0. Let r be the correlation coefficient of a 
random sample of n pairs drawn from a bivariate normal population 
with zero correlation coefficient. Then testing the significance of 
the observed r is the same as testing the significance of difference 
(r— o), or P=o. Here we usually test the hypothesis (Ho)— that the 
sample has been taken from a bivariate normal population with zero 
correlation coefficient, i.e. P-»o. If the hyp. is true, we compute the 
statistic 

t=*= r . — ^r-. which follows a *t* distribution with 

v/|(l-r 2 )/(n-2)] 

(n— 2) d.f. Here the quantity V[( 1—r2 )/( n— 2 )] is the s - e * °f r in a 
random sample of n. If the absolute value of this statistic i.e. 

I 1 1 ^t-o»(n— 2), we reject the hyp. at 5% level, otherwise the 
sample is said to be consistent with the hypothesis. 

The test is based upon the following assumptions— 

(1) The sample is a simple random. 

(2) The sample may be large or small. 

(3) The parent population is a bivariate normal. 

(4) The population correlation coefficient is zero. 

Note : If r is the observed rank correlation coefficient from a 
random sample of it pairs, and the hyp. that the population rank 
correlation coefficient is zero is correct , then the same statistic V 
as stated above is used to test the significance of r. It is otherwise 
obvious because the rank correlation coefficient can be regarded as 
the coefficient of correlation between the two variables. 

Exp. (4) [a] A sample of 10 villages from Meerut district 
showed a correlation coefficient of +0’75 between ‘total cultivable 
area’ and 'area under wheat.’ Is this correlation significant ? Also 
und the probable error of r. 

[b] Find the least value of r in a random sample of 27 pairs 
from a bivariate normal population, significant at 5% level. 

Sol. (•]• (Ho) : P=o. 

Here we compute the statistic 

t = 

V[(l-r*)/(n-2)j 
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or t— 


0-75 


0*75 


=3-26. 


VIO — (*75) 2 }/(10 2)] 0-23 

Now 1 1 1 =3-26 and t. 05 (8)=2 , 306. 

So 1 1 | >t. 06 leading to the rejection of hyp. at 5% level. 

Also, the probable error of r is given by 
P. E. (r)=‘6745xS. E (r) 

= •6745 x \/l(l — r a )/(n— 2)]=-6745 X'23— 0 155. 
Conclusion : The value of r (=0 75) is not significant at 5% 
level, and its probable error is 01 55. 

|b] Let the least significant value of correlation coefficient at 
5°, level be r. Then we must have | t I > *«5 

or I r / \/t(l— r 2 )/(n — 2)] |^t.o s (25), i.e. | 5r/V(l -r 2 ) |>2T 6 
or 25r 2 ^(2 0G)* (1-r 2 ), i.e. 25r 2 3s4 2436 -4 2433 r 2 

or 29-2436 r 2 > 4-2436, i.e. r 2 >0145, .-.r>±0 383. 

Thus0-38o is the least value of j r | required Ans. 

*~*~*”* ’**’*’'’ **"*'’'* 7.3(b) When P^o : we shall 


Situation (b) when P^o. 

(1) One-sample problem. 

H 0 :P= - 

(2) Two-sample problem. 

H 0 : Pi-P*. 

(3) K-sample problem. 

= P*. 


Ho ! Pj — ? 2 — ■ 


we 

| discuss this problem in the 
i following three different situ- 
i ations— 

| (b-1) One-tample problem : 

♦ It is concerned with testing the 

♦ significance of an observed correl- 


& ation coeffi ient when P^fi-o. 

Let r be the correlation coefficient of a large random sample 
of n pairs drawn from a bivariate normal population with lome 
specified correlation coeff icient. Then testing ‘the significance of the 
observed r is the same as testing the significance of difference 
( r ' — “P)» or P=--\ Here we usually test the hyp. -that the sample has 
been taken from a bivariate normal population with specified correl- 
ation coefficient, i.e. P=.... If the hyp. is true, we compute the 
statistic 


z “ ~T /y/( n 3 ^ — which follows a 'standard normal’ distri- 

bution for large samples. Here the quantity l/y/(n— 3) is the s.e. of 
difference (z<— >£) in a random sample of □. The quantity z is 
distributed asymptotically normally about the mean 5 with variance 

l/(n— 3), i.e. Z>— .AN^i;, j, where by Fisher’s z-transformatidn : 

z-1-1513 log 10 (-j~) and 5=L1513 log xo |Z|>l-96, 

we reject the hyp. at 5% level, otherwise we say that there is no 
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evidence against the hyp., or the sample is consistent with the 
hypothesis. 

The test is based upon the following assumptions — 

(1) The sample is a simple random. 

(2) The sample is large. 

(3) The parent population is a bivariate normal. 

(4) The population correlation coefficient is specified. 

(b— 2) Two-sample problem : It it concernei w ; th testing 
the significance of difference between two observed correlation 
coefficients r, and r 2 when 9=fio. 

Let r x and r 2 be the correlation coefficients of two large 
independent rindom samples of sizes n lt n, drawn from the same 
bivariate normal population or from two different populations with 
the same correlation coefficient. Then testing the significance of 
difference (r^r,,) is the same as testing P X =P 2 , where 9 V P 2 are the 
correlation coeffici nts of the two populations. Here we usually test 
the hyp. — that the two samples have been taken from two different 
populations with the same correlation coefficient, i.e . Pi = P 2 - If the 
hyp. is true, we compute the statistic 

Z= — j — — which follows a ‘standard o i a * 

V („— 3+ n — 3 ) 


distribution for large samples. Here the quantity n~^3 ) 

is the s.e. of difference (z t ~z 2 ) in two random and independent 
samples of sizes v lt n 2 . The quantities z v z 2 are distributed asym- 
ptotically normally about the common rffean \ with respective 
1 1 


variances 


l.e. Z,^AN 


iij — 3’n 2 —i’ — 1 — ( ^'n~3) and Z ^ AN ( 5 * ^- 3 ) 
where by Fisher’s z-transformaiions : Z] = ri513 log 10 


z a =ri613 logio and ^=11 513 log 10 ^). 

If |Z|^l - 96, we reject the hyp. at 5% level, otheiwise the two 
samples are said to be consistent with the hypothesis. 

The test is based upon the following assumptions— 

(1) The two samples are simple random and independent. 

(2) The samples are large. 

(3) The parent populations are bivariate norma 1. 

14) The correlation coefficients of the two populations are the same. 
(b-3) k-tample problem : It is concerned with testing ike 
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homogeneity of k (>2) observed correlation coefficients r lt 

whet «'P#0. 

Let r lt r 2 , ..,r t be the correlation coefficients of k large indep-n- 

dent random samples of sizes n* drawn from the same 

bivariate normal population or from k-different populations with 
the same correlation coefficient. Then testing the homogeneity of 

«i»r*,—r* is the same as testing ?i=P 4 = =P», where P, ,P a ,....P* 

are the correlation coefficients of k-populations. Here we usually 
test the hyp.-//io/ the k- samples have been taken frorn'k-different 
populations with the same correlation coefficient, i.e. P 1 =P,= p t . 
If the hyp. is true, we compute the statis’ic x 2 =S(n«— 3 )(z< — 'if 
which follows a ‘X*’ distribution with (k — I) d.f. for large samples. 
Here the quantity z, (/«= 1,2 k) is distributed asymptotically 

normally about the common mean K with variance — — ,, i.e. zc — * 

n,— 3 

AN » where by Fisher's z-transf ormations : 

z^l'1613 aQ d 5= 1 '1513 log J0 ^|-~j. Also, the quantity 

z [=S(ni-3)Zi/S(n J — 3)], the pooled estimate of F„ is the weighted 
mean of with corresponding weights(n 1 -3),(n a -3),...,(n*-3) 

as the reciprocals of their variances in order to estimate \ with 
minimum variance. This quantity^ can also give us the estimate 
for P, using the relation : P=tanh”^. 

If X 2 3*X*.« # , we reject the hyp. at 5% level, otherwise the 
samples are said to be consistent with the hypothecs. 

The test is based upon the following assumptions— 

(1) The samples are simple random and independent. 

(2) The samples are large. 

(3) The parent populations are bivariate normal. 

(4) The correlation coefficients of the p jpulations are the same. 
Exp. (5). (a] A correlation coefficient of 0'73 is obtained from 

a random sample of 28 pairs. Is this correlation significantly 
different from 0'5 ? 

lb] The correlation coefficients between the temperatures of 
unhusked rice and breakage percentage calculated from two inde- 
pendent random samples of sizes 12 and 19 are 0 75 and 0'88 res- 
pectively. Do the two estimates differ significantly ? 

[c] Three independent' samples of 23, 33 and 53 pairs of 
values give correlation coefficients 0‘40, 0*60 and 0*60 respectively. 
/Cre these correlation coefficients homogeneous ? 
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Sol. [a]. (Ho) : P=0’5. 

Here we compute the statistic 

Z where z=11513 logjo ||3^ 3 j=0‘93 and 

-w^r 6 *' 38-1 ' 90 - 1513 '°®»»(f^l)- 0 55 - 

So |Z| <1'96 shoeing no evidence against the hyp. at 5%level. 
Conclusion : The value of r (=0‘73) is not significantly 
different from 0*5 at 5% level of significance. 

|b|.(Ho) : P 1 =P 2 . 

Here we compute the statistic 

Z~ ~ T V Z i~ Za i — \ where *i“W513 log 10 

\ (n7— 3 ^n^— 3/ 


here = 1 * 1 5 1 3 log 10 = 1 

and z*=l-1513 log 10 ^ = 1 


_I-83 — 1-38 njc 12 , AO 

f/1 , 1 \ -°’ 45x 6 - 1 ° 8 - 




So |Z| <196 showing no evidence against the hyp. at 5% level. 
Conclusion : The two estimates are not significantly different 
at 5% level of significance. 

[c] (Ho) : P 1 =P 2 =P 3 . 

Here we compute the statistic X 2 [=2(n,— 3)(z< — z)] 2 from the 
following table — 


Sample 
No. i 


1 


n<— 3 

r* z t 

(n<— 3)z< 


(z»— z) 

(z,-z) J (n<— 3)(z<- z) 2 

20 

•40 42 

84 

II 

-In 

■0225 

•450 

30 

•60 ’69 

20-7 

S 

•12 

•0144 

•432 

50 

•60 -65 

276 

Co 

-02 1 

0004 

•020 


1 


vb 




100 

— — 

566 

II 

1 N 

— 


•902 =x 2 


Now we have 

X 2 =Z(n<— 3)(z<— z)» - 0-902, and X 2 .o 5 (2)=5 991. 

So X*<X 2 . 06 showing no evidence against the hyp. at 5% level. 

Conclusion : The given correlation coefficients do not differ 
significantly at 5% level of significance, and hence they are homo- 
geneous. 

7.4 Testing the significance of an observed regression 
coefficient b when )3 is some specified value : Let b v « be the regre 
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ssion coefficient of y on x in a random sample of n pairs drawn 
fiom a uivariate normal population witli specified regression coeffi- 
cient. Tnen testing the significance of the observed regression coeffi- 
cient b is the same as testing the significance of difference (b'-’/J), or 

fi= Here we usually test the hyp.— that the sample has been 

taken from a bivariate normal population with specified regression 
coefficient, i.e. f3= If the hyp. is true, we compute the statistic 

t=-7 77 *"— tto — i\>/- — which follows a distribution with 
)/( n — 2)c x l ] 

(n — 2) d.f. Here the quantity v , K' J i/ 2 —b ! „*7 !I rl/(n— 2)30*] is the s.e . 
of difference (bj,*— fi) in a random sample of n giving a t ,* as 

the variances of x, y respectively. If |tl^t. 0 j(n — 2), we reject the 
hyp. at 5% level, otherwise the sample is said to be consistent with 
the hypothesis. 

The regression coefficient of x on y in a random sample of n 
pairs with corresponding spec fied value ft in the population can be 
tested sinilarly by computing the statistic 

t= V[(^-b^y)l{n-2)-o/] and then com P arin ? M 

against t. o 5 (n— 2 ). 

The test is based upon the following assumptions — 

(1) The sample is a simple random. 

(2) The sample may be large or small. 

(3) The parent population is a bivariate normal. 

(4) The population regression coefficient is specified. 

7.5 Testing the significance of an observed regression 
function : Let r be the correlation coefficient of a random 
sample of n pairs drawn from a bivariate normal population with 
zero correlation coefficient. Then to test the hyp. — that the 
sample indicates the same degree of association between the two 
variables as expressed by a regression equation, we compute the 
statistic 
r 2 

F 8 -* (i— r 2 )/( n _ 2 ) which follows a ‘F* distribution with (l >n — 2) 

d.f. If F>F.o s (l,n 2 ), we reject the hyp. at 5% level, otherwise 
the sample is said to be consistent with the hypothesis. 

This statistic F, obviously, is the square of the statistic 't' 
used for testing the significance of r when 9=0, and hence the two 

t^sts are equivalent. Therefore, the assumptions for the test are the 

same as stated for the case when 9=0 in §7 ’3(a) 
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Exp. (6) Given that : n=50, S(* — *) 2 =1225, Calculate b y a and 
*=30, S(y— t/) 2 =441, test its significance. 
y= 25, S(*-*) (y — j?)=451. 


Sol : we have b 


_£(*—*) (y-y) _ 451 
v * Si*—*)* 1225 = 


0-37. 


Ho : |8=o. 

Here we compute the statistics 

J.__ bye fl 

' Vf(« 2 v-b 2 r«ff 2 x)/(n-2)a* e ] 

- 0-37—0 _ 0 370 . 

VI{441— (-37) 2 X1225}/48X1225J '068 -°' 44 - 

Now | 1 1 =6 - 44 and t. 06 (48)=2*0l. 

So\t| >t.o 6 leading to the rejection of hyp. at 5% level. 


Conclusion : The regression coefficient b„* is 0*37, which is 
significant at 5% level for n=50. 

Exp. (7). [a] What is a linear regression? How the principle 
of least squares is employed for estimating the constants of regression 
in a regression equation ? 

[b] Give the relation between the regression coefficients and 
the correlation coefficient. What is the main difference between 
these two types of coefficients ? Also point out the difference 
between the correlation and regression theories. 

Sol. [a] If the points on a scatter diagram seem to cluster 
about a straight line, it suggests some linear relationship between 
the variables, and this straight line is called the line of regression. 
This gives us the average value of the dependent variable corres- 
ponding to a given value of the independent variable. If both the 
variables can take the role of independent variable, we have two 
regression lines— (I) of y on *, and (2) of * on y. These lines give 
us the average values of y and * for corresponding given values of 
* and y respectively. The two lines, in general, are different except 
in the case when the correlation between * and y is perfect. 

Let us now consider y=a + b* be an equation of regression 
line of y on * in a random sample of n pairs drawn from a 
bivariate nornral population, where a and b are the constants of 
regression. If y e (= a+bx) be the estimated value of y for a given 
value of*, and nS 2 v =S(j> -^«) 2 =S(y — a— b*) 2 =Rbe the sum of 
squares of deviations of the observed value y from the estimated 
value y„ summation being taken over all pairs of values, then to 
estimate the constants a and b, the principle of least squares (PLS) 
is employed. According to this principle, the constants a and b 
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are so chosen that the residual sum of squares R is minimum. 

For this purpose, we partially differentiate R with respect to 
a and b, and then equate each to zero to get the two normal 
equations as — 


-_=_2S( 3 /-a-bx)=e 
!£=— 22x(y-a— bx)=o 


na— bSx=30 
Exy — a E.v — bEx 2 = o 


£=a+bx (I) 

i e. 

cov (x,y)+xy=ax-fb(x 2 -f a 2 «)J (2) 

Multiplying (1) by x and subtracting from (2) we get 

b=cov (x,2 and then a=ff— C0V [ X —x. Substituting the e 


estimated values of a and b into the equation y — a+hx, the equ 
to the line of regression of y on x is 

y—y=b yx (x—x) — (1), where b» a =cov (x,^)/a a ». 

Similarly, the equ. to the line of regression of x on y is 
x- x~b M ( y-y ) (II), where b*s,=cov (x,y)/a 2 „. 

From (I), (II), it is obvious that both the lines pass through the 
mean point (x,y). The coefficients b vx and b xv are called the regre- 
ssion coefficients of y on x, and of x on y respectively. 

[b] The correlation coefficient between the two variables x 
and y is the geometric mean of their two regression coefficients, 
i.e. r=± Vlbv*'b*»). The sign of r is (+) or (— ) according to the 
sign possessed by either of the regression coefficients. 

The main difference between the coefficients of correlation and 
those of regression maybe seen as follows'. 

(1) The coefficient of correlation is a measure of direction and 
extent of correlation between the two variables, and it indicates 
whether the change in one variable tends to bring a change in the 
other variable in the same or reverse direction. But a coefficient of 
regression gives the average change in one variable corresponding 
to a unit change in the other variable. 

(2) The coefficient of correlation can never exceed unity while 
a regression coefficient can. 

(3) The coefficient of correlation between any two variables 
is always symmetrical in the variables but a regression coefficient 
is rarely so. 

(4) The coefficient of correlation is independent of the 
change of both origin and scale but a regression coefficient is 
independent of the change of origin only, and not of scale. 
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The fundamental difference between the theories of correlation 
and regression may be looked as follows : 

(1) In the correlation theory both the variables are assumed as 
random while in the theory of regression, one of the two variables 
is treated as independent and the other as dependent. 

(2) No correlation implies no regression of the variables 
but its reverse is not always true. 

Exp. (8). fa] Show that 0, the acute angle between the two 
lines of regression, is given by : tani|;=(I-r 8 )(T J .(i ! ,/r(o s *+o 2 ,).Interpret 
the case when r=o, _fcl. 

Lb] Obtain the standard error of estimate ofy(orx), and 
hence or otherwise show that the departure of the value of r 2 from 
unity is a measure of departure of the relationship between the two 
variable^ from linearity. 

Sol. [a] If 0,, 0 2 be the angles which the two regression lines 
make with the axis of .y, then the slopes of the two lines may be 
given as tan0 1 =r<r y /<7 <e =b w , and tan0 t aa y /ra,= l/b« y . Thus the 
acute angle 0 between the two lines of regression is given by 
tan 0 =tan (0 2 ~^) 

Oy TOy ’ 

tanfl 2 ~tanfli _ro, ~ q« _ <s v (l-T*)/ro x 2< ..v 

1+tan 0 2 . tan 0! j_j_ q^.rorj, r(a 2 ,+a 2 y )/ra* a K ' 

a * 

=- — — • 7 - f -: Hence proved. 

If r=0, then 0=90° so that the two lines of regression are 
perpendicular to each other. In this situation, the estimated value 
of y (or x) is the same for all values of x (or y). But if r=±l, 
then 0=0,180° so that the two lines of regression coincide with 
each other Ans. 

lb) Let y, [^y+byfx—xt] be the estimated value of y corres- 
ponding to a given value of x obtained from the regression equation 
of y on x in a random sample of n drawn from a bivariate normal 
population. Then the minimum residual sum of squares of y is 
given by 
nS 2 y =2(y— y,) 2 

=X(y-gy-2b v j:(x-x){y-y)+b\.Z{x - xf 

=no* y — 2b Va . nr<tyj y -t-6 2 yjr «<i 2 , 

=»r/r* rq » - - ■ f2< * 2 » -* i 
y - rOgOy -f-^5 e J 

=sbo 2 v( 1— r*" 
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.\S 2 » =o* v (l — r 2 ), and S v =<t w (1 — r 2 ) 1 / 2 (1) 

This value of S v , given by (1), is called the standard error of 
estimate of y, or sometimes the root-mean square error of estimate 
of y. Similarly, the standard error of estimate of * is given by 

S,r=ff«(l — r 2 ) 1 / 2 (2) 

Now from (1) or (2), it is obvious that r 2 <l, or — l<r< + l, since 
the sum of squares of deviations is always positive. Thu^ if r=±l, 
the sum of squares of deviations from either of the regression lines 
is zero, and consequently each deviation vanishes showing that all 
the points lie on both the lines of regression. It means that both 
the lines then coincide with each other and hence there is a linear 
functional relation between the two variables * and y under study. 
Further, as r 2 approaches unity, the residual sum of squires S 2 a and 
S% approach zero so that the points are closer to the regression lines 
which tend to coincide. Therefore, the departure of the value of r 2 
from unity is a measure of departure of the relationship between 
the two variables from linearity. 

Exp. (9). |a] In a partially destroyed laboratory record of an 
analysis of correlation data, the following results only are legible — 
Variance of at= 9, the regression equs— 8*— 10y+66=0, 

40*— 18«/=214. 

Find, what were (i) the mean values of * and y; (ii) the corre- 
lation coefficient between * and y, and (iii) the s.d. of y ? 

[b] Find the most likely price in Bombay corresponding to 
the price of Rs. 70 in Calcutta from the following data — 

, Calcutta Bombay 

Average price... 65 67 

Standard deviation... 2 - 5 3'5. Coefficient of 

correlation is +0‘8 between the two prices of the commodities in 
the two cities. 


[c] Given that : n=50. *=30, 17=25, S(*-*) 2 =1225, S(y-yY 
=441, and S(*— x)(y— j/)=451. Find the two regression lines and 
hence or otherwise the coefficient of correlation between x and y. 

Sol: [a]- (i) Since we know that the two regression lines 
intersect at the point (* , y), so the means of x and y can be found 
by solving the equations 8*— lCy+66=0, and 40*— 18y=214. 
Thus we have : 8*— 10y=— 681 40*— 60«=— 3301 .. 

40*-18y= 214 J » or 40*-18y= 214 J 8 IV,n ® 


*=13, J/=17, i.e. (x,y) E(S,F)=(13,17). 

Hence the mean values of * and y are 13, 17 respectively Ans. 

(ii) In order to find the correlation coefficient r between * and 
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and y, first we have to find the two regression coefficients b$ x and 
b uV . These coefficients require the determination of two regression 
lines of y on x, and of x on y. There may be two possibilities— 

I. T8x — 10y+66=0 may be the regression line of x on y, and 

[40*— 18y=214 „ „ „ „ „ „ y on x. 

T . r 10 00 . 40 214 , 

Then the lines : x= — y— — , and 4=75— give us 

o o lo lo 

b x u= 10/8 and &„ a .=40/18, so that b xy Xb llx =r i = — X^>1 hence 

8 

inadmissible. 

II. T 8x— 10y+66=0 may be the regression line of y on x, and 

[40x— 18?/=214 „ „ „ „ „ „ x on y. 

Then the lines : y = j^x+ Jq, and x =Q$ A r give us 

8 18 

b Vx = 8/ 10 and 6^=18/40, so that b vx xb xV =r 2 = — X— <1, hence 

admissible. 

8 1 8 

Thus we get r 2 = — X— =0'36, .\r=±0‘6. 

Since the regression coefficients are (+ )ve, so the sign of r 


S0 a * 0-6 x 
Thus the s.d of y is 4. 


+0-6 

> • Ans. 

0’6Xcr v 
3 ” 

18 

10 

r 4 - 

■ An*. 


[b] If x and y represent the prices (irfRs.) at Calcutta and 
Bombay respectively, then the regression line of y on x is 

y— y=— (*—•*)> i e - 2/ —67= — g-- f.v-65) 

or j/-67=1T2(x— 65), i.e. y=l‘12 x— 5 80 
.'. y e =lT2x70— 5’80=72'60 Rs., the most likely price in 
Bombay corresponding to the price of Rs. 70 in Calcutta An*. 


[c] The regression coefficient of y on x is 

I»,.=S(x-x)(y-y)/S(x-x) 2 =451/4226=0-37. 

Using this value, the regression line of y on x will be as 


y- 26=0 37(x-30) , i.e. j-=0 37x+13 9 . An*. 

Similarly, 6 e *=S(x— x)(y-y)/S(y-g) 2 =45l/iil=V02 is the 
regression coefficient of x on y. Using this value, the regression 
line of x on y will be as % 

x— 30=»r02(ji— 25), i.e. x=102y-(-4-50 An*. 
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' Also, r t =b„„xb x j,=0‘37x l-02=0*3774 
.\r=+0'61 Ant. 

Aliter : We know that rrjr- 

VLS{x-xy-.S(y -y) 1 ] 

___ 451 

~ y/ [1225x4:41] -+ 0 ' 61 An »* 

7.6 Multiple and Partial Correlation. , 

7.6.1 Introduction : In a multivariate popul ation consisting 
of three or more variables, sometimes we are interested in knowing 
the relationship between any two variables. In such cases, the 
different variables may be mutually related by someone or the other 
phenomenon which will usually be influenced by the other remai- 
ning \ariables of the population. For example, the grain-yields are 
affected by the sowing-dates, row spacings, depths of ploughing 
or sowing, levels of irrigation and dozes of a feitilizer used. Such 
type of relationship between any two variables may be stuffed in 
the following two ways - 

(i) Specification method : 

Here we consider only those 

i (ii) Elimination method. j variables of the observe ( data in 

specified values. It is usually employed ii finding the combine! 
influence of a group of variables u^on a variable i.ot the member 
of the group. The method is useful in the study of multiple corre- 
lation and multiple regression. But it has the • disadvantage of 
restricting the size of the data, and also the results of this can be 
applied to only those situations wherein the other variables have 
SviLe specified values. 

(ii) Elimination method : Here we eliminate the influence 
of the other remaining variables on the two variables under study. 
The method is useful in the study of partial correlation. But it 
has the disadvantage that only the linear effect of the variables can 
mathematically be eliminated from both the variables under study, 
and not the entire influence. 

7*7 Determination of multiple regression equation# : Let us 

consider a random sample of n sets of corresponding values of the 
three variables x lt x 2 and x 3 drawn from a trivariate normal popula- 
tion. If these Variables are measured from their respective means 
x 1( 3c, arid x 3 , /.e. the origin is necessarily at the means, then the 
quantities so obtained can be represented by x v x t atfdx a . NoW^ the 
mfikipie regression equation of x) on x 2 ,x a dan be given as 
, ••••■ ( 1 ) 


Methods of Study 

(i) Specification meth d. 
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where the constants a and b’s are such as to provide On the 
average the best estimate of x t viz. x,» for some specified values of 
the remaining variables x 2 and x 3 . Tne quantities a and b's are 
called the constants of regression. 

The constants b l2 . 3 and b l3 . 2 are known as the purtial regression 
coefficients of x 1 on x 2 , and of x, on ,v 3 respectively, which some- 
times may be called as the regression coefficients of 1st order. The 
order is decided by the no. of the subscripts after the point. The 
subscripts occurring before a point areca led the prim iry subscripts, 
those after the point as secon lary. Amongst the primary, the first 
subscript to the b's derotes the subscript of the dependent variable 
and the second sub cript to that of the x to which it is attached, 
while the secondary subscripts denote the subsc ip’s of those 
variables whose effects have been eliminated from those of primary. 
Thus b 12 . 3 is the regression coefficient of 1st order of x x on x a after 
the linear effect of the thi d variable x 3 has been eliminated from 
both x t and x . Or, it may be said as the regressio 1 coefficient of 
*1- 3 on x 2 , 3l where Xi 3 {=x l — b 13 x 3 t and x 2 . 3 (=x 2 — b 23 x 3 ) are the 
residuals of 1st order of x 2 on x 3 , and of x 2 on x 3 respectively. The 
similar meanings are attached with the other coefficient b l3 . 2 . 

The above said constants a and b's of the equ. (1) can be 
obtained by employing the principle of least squares. According to 
this principle, the values of a and b's are so chosen that the 
residual sum of squares of x 2 (i.e. the sum of squares of deviations 
of the observed values x 3 from its corresponding estimated values 
x M over all n sets of values of x a , x 3 ) is minimum. This residual 
sum of squares is given by 

2(.Vi x le )®=2(x 1 a — b 12 . 3 x 3 - b 13 3 x 3 )'~ 2 ix ~\. 23 ... (2) 

Thus to obtain the best estimate of Xj, given by (1), we 
differentiate <2) partially with iespec to a, b l2 . 3 and b l3 . 2 , and then 
equate each to zero to get the fallowing three norrnil equations— 
Sxi. 23 =o,Sx 2 x 1 . 23 =o ( and S.y i x 1 . 23 =o. Solving* these three 
equations simultaneously for the three unknown constants a, b 12 . 3 
and b 12 . 2 , we get a—o, 

V-s=— and b u . 2 =~ (3) for which 

Z3J1 °3 /-111 . 

£xV 8 is minimum and x u is the best estimate of Xj. Here the 
quantities o„<r 2 and <r 3 are the respective s.ds. of xj,x a and x 3 ; andAu* 

* For solution of the normal equations, please see §. 7*12 
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Au> Am * re the respective co factors of the elements in the 1st row 
and 1st col; 1st row and 2nd col, 1st row and 3rd col. of the 

1 fin Ti 


3x3 determinant A= 


'21 


13 M3 

1 r. 




' fiO j ■ This determinant of total 

r 3i r 33 1 correlation coefficients(ot correlation 
coefficients of zero orders) is obviously symmetrical in the off- 
diagonal elements, since r la =r 21 ; r 23 =r 32 and r 31 =r 13 owing to the 
property of symmetry of the total correlation coefficients. The 
values of the cofactors and the dete'rainant are given as 
i+i 

Au=(-D. 


1 r 23 

. . 1+2 

f 21 r 23 

r 32 1 

= +(l-r 2 23 ) ;Ai 2 =(-l). 


r 3l 1 


— ( r 21“ r 31 r 23*i 


m 

Al3 = ( !)• 


1 


|=+(' , 2l'-32-/3 1 ); an(J A = An + r i 2 Al 2 + 

r 3! r 32 , 13A 1 3=1 -r 2 la — r 2 23 — / 2 31 +?.r Ji! r 23 '- 3l . 

If we substitute these values of ,a’ s i n t 0 (3), we get the values of 
the regression coefficients of 1st order in terms of correlation 
coefficients and s.ds. of zero order each. Hence on substituting 
these computed values of the constants a and b's from (3) into (1), 
we get the desired equation of the regression plane of.^ on .v a , 

a* 


^1 Al2 

x : ,= — - . X,— 


An 


a 13 * 3 * ~ An+ r 1 Am=0 (4) 

LXli a l °2 a 3 


If the origin is not necessarily at the the means, the above 
equation of the regression plane will be as 


- x An+ 


Al2 + 


Al3 9. 


•( 4 ') 


The multiple regression equations of x t on x v x 3 ; and of x 3 
on Xj, can be obtained similarly. Generalizing this trivariate* 
case to that of a p-variates, we can show that the equ. of the 
regression plane of x x on x 2 , x 3 , ,x p will be as 

~*An+ A 12 + +~’Aip=0...(J) (if origin is at the means) 

"1 a 2 a j> 


O- ? Hr‘A»+ +^A„-0 (S') 

(if origin is not at the means) 
where x’s, o’s are the means, s.ds. of the corresponding 
variables x’s; and A’^ are the minors of the corresponding elements 
in the pxp determinate A of total correlation coefficients. 

0 * The theory of trivariate case was developed by prof. K. 
Pearson (1896) and a year later its generalization was given by Yule. 
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7.8 Properties of the residuals : The residuals of any order 
have the following three main properties— 

1st : The sum of the products of the corresponding values of 
a variate and a residual is always zero provided the subscript of the 
variate occurs among the secondary subscripts of the residual. 
For example, '£x 1 x 2 . 13 =u= Zx 3 x 3 . n , and Sx 2 x 3 . 12 =o=Sx 1 .y,. 1? etc. 

In general, we have Sy < x 1 . 23 ... 1) =o etc. for i— 2,3 p. 

2nd: The sum of the products of any two residuals remains 
unchanged provided we remove from one residual any or all of 
the secondary subscripts which are common to both. For example, 
i'Yi. 23 Xi. 2 =SYi. 23 (Xi b l3 x 3 ) = ^Xi. 33 Xi, and 
^*i.23*i.23 = ^‘*i , 23(*i A 12 . 3 x j b l3 . 2 x 3 ) ~^*i>23*i etc. 

In general, we have 

Sx 1 . 3 4...j,x 2 . 3 4 n^^l-34" pX 3 =^X 1 X z . 3 i-- p e t®* 

3rd: The sum of the products of any two residuals is always 
zero provided all the subscripts of a residual occur among the 
secondary subscripts of the other. For example, 
Sx 1 . 23 y 2 . 3 =Sx 1 . 23 (y 2 — b i3 x 3 )=o, and 
Ex, - 23 * 3 . 2 = ’Zx 1 . 23 (x 3 —b M x. l ) =0 etc. 

In general, we haveS.Yx .234 .. a * 2 . 31 ..„=o etc. 

7.9 Determination of residual variances : Let us consider 
a random sample of n sets of values (.v 1( x 2 , x 3 ) diawn from a 
trivariate normal population. If the variables (x v x 3 ,x 3 ) are measured 
from their respective means ’Xx.X|,x 3 ), then the residual variances 
of these variables can be represented in terms of their s.ds. and 
correlation coefficients of zero order each.. For example, the 
variance of the second order residual of x x on .v 2 , x 3 can be found 
as follows — 

var (*x. 23 )= -i- 2 .x- a i. 2 3 =-^S.Y 1 . 23 .v 1 . 2 3 = (by 2 nd property) 

Of ° 2 1 23 = "^*l (*1 b X 3 . 3 X 3 613.5*3) 


V2Z— a i 


— btf.rpXiXi 6 13 . 2 i»x l x 3 ] 

= “n t wa i a — *12-3 n<Wi 2 — b L3 . 2 na x <J z r 18 ] 
=0^ bi^Oi(s^T ia 

2 A 


An 


( 1 ) 


where the symbols have their usual meanings, as stated for * 
relation (3) in § 7*7. 
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Similarly, it can be shown that the var (x 2 . 13 )/.e.'i 2 2 . ia =aY^— ,and 

A 22 

var(x 3 . 12 ) i.e. <s \. i2 =<*\~ ■ 

& 33 

Here it may also be noted that var(x 2 i) /.c.ct' 2 . 1 =(t S 2 ^-=o 3 4 (I— r a M ) 

A za 

I 1 r is . , 

,whereA= , and A 22 =l> Thus evidently, it is a 

I r 21 1 2X2 

residual variance of 1st order only and so may be compared with 
the expression S a w =cr a 1/ (1 — r 2 ) of (1) in example (8); provided we 
treat x 2 as y, and x x as x. It means that <r 2 . x , the s.d. of 1st order of 
x a on Xi, may be termed as the s.e. of estimate of x 2 on x t when 
the regression of x 2 (or x x ) on .v x (or x 2 ) is linear. 

Generalizing this trivariate case to that of a p-variat s, we 

can show that var (x i . tt ..- 9 )i.e. o 2 1 . 28 ... 1 ,=a s 1 -^- 1 where An is the 

An 

minor of the element in the 1st row and 1st col. of A» thepxp 
determinant of total correlation coefficients. 

7.10 Determination of multiple correlation coefficients : 

A multiple correlation coefficient of a variable on the group of 
remaining variables under study can be dt fined as the amount of 
total correlation between the observed and estimated values of ' he 
variable on a regression plane. 

Let us now consider a random sample of n sets of correspon- 
ding values of the three variables .v x , x 2 and x, drawn from a 
trivariate normal population. If the variables are measured from 
their respective means and the quantities thus obta : ned are represe- 
nted by x u x 2 and x 2 , then the multiple correlation coefficients of 
these variables on the groups of remaini ’g two variables can be 
represented either in terms of residual variances or total correlation 
coefficients. For example the multiple correlation coefficient of 
x x on x 2 , x*, by definition, can be seen as the total correlation 
coefficient between x x and x u ('=£> 12 . a x 2 +&i 3 - 2 X 3 =x 1 — x x -j>s), where 
x u is the estimated value of x 1 for some specified values of x 2 and x 3 
on the regression plane. It is generally denoted by the symbol R : ( 2 3). 
Thus by definition, we have 
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or Rifes) ~ 




n /[2.v i ‘(SV-2S.v 1 x 1 . 23 +2^ 1 . 23 )] 

(21x 2 i Lx~ t . 23) 


Vt^xW-s.vVdl 

VVx\ S-V" t .23) _ 

V&x\) 

Riu»=('- 1 ^ 1 ) 1/ % r R*k«)=1- 


(by 2nd property) 


I 

■(‘-^Sr r 


r.e. 1 R 2 i(jd) — ® s x'23/ ff "i- 


■0) 


If in the above relation (1) we substitute the value of residual 
variance <r 2 i. 2 3 (=<?i 8 A/An)» we can have the value of the multiple 
correlation coefficient R,( 23 ) in terms of total correlation coeffici- 
ents as 

A 


/ A V' 2 

Rite)— ^1 Aii) ’ ° r 


An’ 


i.e. 1 — R ; 


* 1 ( 13 ) 


= A/Au (2), where the symbols have their usual meanings. 

Here it may also be noted that o<Rx( a3 )<l always, since the 
term 2x 2 x cannot be ( — ) ve. Further, if R X (. 3 )=o, x x is uncorrelated 
with any of the other variables x a and x 3 . But if Rx(» 3 )=l, then we 
have o 2 x. 23 =o, t.e. all the residuals Xx. a3 are zero, tlje observed and 
estimated values of Xj coincide and hence the observed x t is a linear 
function of x* and x 3 . 

The multiple correlation coefficients of x a on x 3 , Xi; and of xa 
on Xx, x a can be obtained similarly as 

or|l— 


A \ m 

A 33 / 


Generalizing this trivariate case to that of a p-vari- 


ates, we can show that the multiple correlation coefficient of x x on 

x 2 x 3 , , x„ is R i(i3 „) * ( 1 — ° — j or ( 1 j » where 

An is the minor of the element in the 1st row and 1st col. of 
the pxp determinant of lotal correlation coefficients. 

7.10.1 Testing the significance of an observed multiple 
correlation coefficient R when R =0 : Let R be the multiple 
correlation coefficient of a variable on the group of other remaining 
p-variables in a random sample of n sets of values drawn from a 
(p+l)-variate normal population with zero multiple correlation 
coefficient. Then testing the significance of the observed R is the 
same as testing the significance of difference (R~n), or R=o. Here 
we usually test the hypothesis — that the sample has been taken from 
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a (p+l)-variate normal population with zero multiple correlation 
coefficient, i.e. R=o. If the hyp. is true, we compute the statistic 
(due to Fisher) 

F— 7 —=-, . n — — — — which follows an *F' distribution with 

1 — R* p 

(p, n -p— 1 ) d.f. 

If F>F. 05 (p,B— p— 1), we reject the hyp. at 5% level, other- 
wise the sample is said to be consistent with the hypothesis. 

The test is based on the following assumptions— 

(1) The sample is a simple random. 

( 2 ) The sample may be large or small. 

(3) The parent population is a (p+lj-variate normal. 

(4) The population multiple correlation coefficient is zero 

(5) The quantities R a and(l— R 2 ) are distributed independently 
like cbisquares with p and (n--p— l) d.f. respectively. 

7.11 Determination of partial correlation coefficients: 

A partial correlation coefficient between any two variables 
can be defined as the amount of total correlation between these two 
variables provided the linear effect of the group of remaining variable 3 
on a regression plane has been elimina:td from both of these 
variables under study. 

Let us now consider a random sample of n sets of correspon- 
ding values of the three variables x x , x 2 and * 3 drawn from a 
trivariate normal population. If the variables are measured from 
their respective means and the quantities thus obtained are denoted 
by x v x 2 and .v 3 , then the partial correlation coefficients between any 
two variables, when the linear effects of 'the remaining third 
variables have been eliminated from both of these variables, can be 
represented in terms of correlation coefficients of lower order, or 
the total correlation coefficients. For xample, the partial correlation 
coefficient between x x and x 2 , when the linear effect of x 3 has been 
eliminated from both of x v x 2 , can be seen as the total correlation 


coeffic’et t between x v3 (=x 1 —b l3 x 3 ) and x ,. 3 (=.r 2 — b i3 x 3 ). It i s 
generally denoted by the symbol r 12 . 3 , which is obviously a correlation 
coefficient of 1 st order. Thus by definition, we have 
_ cov fo.a.xg.,) 

®x. 3 X o 3 -8 

cov (■*!■„, x 2 . 3 ) x .cov (x 1 . 3l y a . 3 ) , l 1/2 

. ct 1'*X<T 2 . 3 J 

0 2*3 a 1*3 J 




A2l 1 1/2 _ Al2 
A22J yf (An A 22) 


(.’.A 12= A2O 
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or ^ 12 . 3 — 


— A 


12 




V( A 11 A 22) 

r 12 ““ ^12 T\ 


(i) 


13 r 23 




( 2 ) 


In relation (1), out of ± signs we have detained only the(— )ve 
sign for r 12<3 because r r >. 3 bears the same sign as possessed by b 12 . 3 , 
b 21 . 3 andAi 2 (orAsi)- The sign of b 12 3 (or b 2V3 ) is (— )ve, as attached 
wi’h A 12 (or A 21 ). since An. A 22 are alvays ( + )ve. 

Here it may also be noted that — l<r 12 . 3 < + l always. 
Further, if r 12 3 =o, is uncorrelated with any of the other variab’es 
x 2 and .r 3 (or .v 2 is uncorrelated with any of the other variables x 3 
and Xj). It means that r 1V3 will not be zero unless r 12 =o and at 
least one of r 13 , r 23 is zero. Bat if r 12 . 3 =l,the three regression planes 
will coincide with each othe', and hence the necessary and sufficient 
condition for the coincidence of x u x 2 and jc 3 is that 
r a i 2 +r a 23 +r 2 31 — 2 r 12 r 23 r 31 =l . We also have r 12 . 3 *=r 21 . 3 , since a 

partial correlation coefficient is always symmetrical for the inter- 
change of its primary subscripts provided the secondary subscripts 
remain the same. 

The partial correlation coefficients between x 2 and x 3 after 
eliminating the effect of x x \ and between x* and x x after eliminating 
the effect of x 2 , can be obtained similarly as 

r 23 r 2i r 3i anA __ r8 i-ft*Tit 


r 23-l = 


r and r 3l . 2 “ 


V[(I— r 2 21 )(l— r 2 31 )r 1B1 ‘ 2 VUl-r 2 a 2 )(l— rV,>] 

If we further consider a sim lar case of four variables x lt x 2f 
x 3 and then the partial correlation coefficient between x x and x 2 
after eliminating the effects of .v 3 and .v 4 can be seen as the a uount 
of total correlation between x x 3i and .y 2 . 34 as 

r i2*3 4 = — 7 f/V~ — — rr» which is a correlation coefficient of 
V Lv t * i3*4a t r fc 23 . 4 ) ) 

2nd order expressed in terms of 1st o.de' carrelation coefficients. 

Gerera’izing it 10 a case of p-va r iates, we can show that ihe 
partial correlation coefficient between x x and x 2 , after the linear 

effects of the remaining (p — 2) variables y 3 , .v 4 , 9 x v have been 

eliminated from both x x and x 29 can be given as 


r 12«34- 


_ r !2-34 (j-l) r i3«4 5 P r 23-45 P 


V[d-r 2 i3 45 


which is a correlation 


2>H 1 J*”l.3-45 p)\ 

coefficient of (p— 2) th order expressed in terms of (p — 3) th order 
correlation coefficients. It can also be given by the relation 
__ cov fo.q , 


r i2-« — 


J VQ 


xa 


= (b 


12*<7 


xb 2 i . Q ) 1/2 , where q stands for the 


2 m Q 


group of suffixes 3, 4, p. 
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7.11.1 Testing the significance of an observed partial 
correlation coefficient r when P=0 : Let r be the partial conela- 
tion coefficient between any two variables, after eliminating the 
effects of the remaining k-variables, in a random sample of n sets 
of values drawn from a (k + 2)-variate normal population with zero 
(corresponding) partial correlation coefficient. Or, 

Let r be the kth order correlation coefficient in a 'random 
sample of n drawn from a (k + 2)-variate normal population with 
corresponding correlation coefficient as zero. Then testing the 
significance of the observed r is the same as testing the significance 
of difference (r^O), or P=0. Here we usually test the hyp.— that 
the sample has been taken jrom a (fc-f-2) ■ variate normal population 
with zero correlation coefficient, i.e. P = 0. If the hyp. is true, we 
compute the statistic ( due to Fisher) 


which follows 


distribution 


** — V[(l — r 2 )/ n — k — 2)] a v 

(n — k — 2) d.f. Here the quantity y[(l — r 2 )/(n — k — 2)] is the s.e. 
of r in a random sample of n. If the absolute value of this statistic 
i.e. |t|^t. 05 (n— k— 2), we reject the hyp. at 5% level, otherwise the 
sample is said to be consistent with the hypothesis. 

The test is based on the following assumptions — 

(1) The sample is a si nple random. 

(2) The sample may be large or small. 

(3) The parent population is a (k+2)-variate normal. 

(4) The population partial correlation coefficient is zero. 

Note (1) : If P=Q, then the statistic required to test the 

significance of a kth order correlation coefficient in a random 
tample of n fiom a (k+2)-variate normal population is analogous 
to that of a zero cider correlation coefficient from a bivariate 
normal population with sample size n reduced by k. Thus a test of 
significance for a partial correlation coefficient with k-secondary 
subscripts can easily be obtaind from that of discussed in § 7 3(a) 
simply by replacing the quantity (n— 2) by (n— k— 2). 

Note (2) : If P^O, then the statistics discussed in §7’3(b — 1,2,3,) 
also hold equally for partial correlation coefficients with the mere 
change that the sample sizes are further reduced by the nos. of the 
secondary subscripts of the partial correlation coefficient? under 
study. Thus the tests of significance for the correlation coefficients 
of order k can easily be obtained from those of discussed in 
§7*S(b— 1 ,2,3) simply by replacing the quantities (n— 3), (n<— 3) by 
(n— k— 3), (n<— k— 3) respectively. 
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712 Determination of partial regression coefficient?: Let 

us consider a random sample of n sets of corresponding values of 
the three variables x lt x 2 and a 3 drawn from a trivariate normal 
population. ' If these variables are measured from their respective 
means x v x 2 and a 3 , and the quantities thus obtained are represented 
by x lt x 2 and x 3 , then a multiple regression equation of x t on x 2t x 3 
can be given as 

bi2-3^2"t~ b 13 . 2 A 3 ( 1 ) 

where a and b's are the constants of regression and x l6 is the 
best estimate of x x for some given values of a 2 and a 3 . The constants 
b 12 . 3 and b 13<2 are the partial regression coefficients of the above 
said regression equation. The coefficients of regression along with 
the constant a need be determined in a way so as to give on the 
average the best estimate of x x corresponding to any assigned values 
< f a 2 and a 3 . These coefficients can be determined in terms of 

(a) s.di. and correlation coefficients of zero orders, and 

(b) s.ds and correlation coefficients of higher than zero orders , 
i.e. partial s.ds. and par ial correlation coefficients . 

(a) b’s in terms of a, and r, of zero orders : 

In order to determine the regression coefficients b 12 . 3 and b ]3 . 2 
in terms of s ds. and conelation coefficients of zero orders, we have 
to find the regression constants a and b’s such tnat 

R — E(a*i — A le ) 2 = 2( * 1 — a — b 12 . 3 A 2 — b 13 . 2 a' 3 ) 2 = E.r 3 - 4 « 28 (2) is a 

minimum so that a 1c is the best estimate of x x for some given values 
of a 2 and a 3 . This object can be attained by employing the principle 
of least squares. According to this principle, we partially differen- 
tiate the residual sum of squares of x x , given by ( 2 ), w.r.t. a and b's 
and then equate each to zero in order to have the following three 
normal equs— 

r)R 

(i) r— =- 2 E(.y 1 — a — b 12 s-v.-b^ ».y 3 V-=0 

da 

r)R 

(ii) =— 2 Sx s (.y 1 — a— b u 3 .v 2 - b I3 . : .Y 3 )=o 

dO 12 3 

(iu) = 2 S.v 3 (a 1 a b 12 . 3 v 2 b 13 . 2 A 3 )— 0 

00i3 o 

or 

a b 12 - 3 A 2 bi 3 . 2 A 3 )== 0 =S.Vi . 23 

Sa 2 (Ai — a 1^22*3^2 b23.2A‘ 3 ) — 0 S.VoAj 2 3 

— a b 12 . 3 A' 2 “- b 13 . 2 A 3 ) = 0 = Sa" 3 .V 1 . 23 

Considering the summation over all n sets of values, we have 
Sx‘ 2 — na — 1^22 b 13 . 2 ^A ‘ 3 =0 

Sa 2 Aj — aSA 2 — b22*3^A^ 2 b2 3 .2^A 2 A 3 — 0 

S Ag Ai — a E A3 — b22 • 3^A 3 A' 2 ^23 . 2 2 A a 3 = 0 
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or 

a=0, since Sx 1 =Sx 2 =Sx 3 =0, and 
nr 12 a 2 aj — -0— b 22 .3n<J 2 2 — 'bi3. 2 nr 2 3(j 2 (T3=0 
nr 3 i° 3 a i 0 bjj^nrjj^ffj b 13 . 2 na a 3 =0__ 

Thus we have a =0 from (i), and b 12 . 3 , b 13 . 2 can be obtained 
by simplifying the last two equs— 

— r i 2 ®i+b 12 . 3 <r a +b 13 . 2 r 83 o 3 =0 
or 

— r 31 O l + bl2.3 r 3'2 <J 2 + b 1 3. 2 Gj : =0 


r 12 ffi — b 12 . 3 <r 2 — b 13 . 2 r 23 <T 3 — 0 

fjl*! - ^12.3*32*2 bjj. 2*3 = 0 

Now we get 


and 

b 13 . 2 = 


r ia a l r 23 a 3 


0 2 To 3 <J 3 


r i2 f 23 


1 r 23 




= ~<J l <J 3 


~f02 a 3 


— r 3i a l a 3 


r 32 a 2 a 3 


r 31 1 


r 3 2 1 

r i2 a i a 2 

! ~~ 

\°2 r 23 a 3 

I 

r l2 1 

|1 r 23| 




== — a i a 2 


'T' (T 2 <T 3 


hi**! ^32^2 


r 32 (T 2 


r 31 r 32 

i 

r 32 V 


"’I' Ai-i » 
°2 All 


=-j*± Ail. 

*3 Ail 


whereAn, Ai 2? Ais are the respective cofaciors of the elements 
in the 1st row and 1st col., 1st row and 2nd col; 1st row and 3rd col. 
of the 3x3 determinant A of total correlation coefficients. Thus 
the values of b 12 . 3 and b 13 . 2 are expressi Me in terms of a, and r, of 
zero orders. Similarly, 


it can be shown that b. 


23*1 


. —*2 A 23 

*3 A 23 


, and b 21 .3 — 


*2 A 2 1 


A22 * 


etc. Generalizing this trivariate case to that -of a p-variates, we 

can show thatb 22 . M ... 9 = — . ( whereAu.Au are the respective 

°2 Au 

minors of the elements in the 1st row and 1st col; 1st row and 2nd 
col. of the pxp determinant A of total correlation coefficients. 

(b) b’s in terms of a, and r, of higher th in zero orders : 

In order to express the regression coefficients b 12 3 and b 13 . 2 
in terms of s.ds. and correlation coefficients of higher thai zero 
orders i.e. in partial values of a, and r„ we have to con.iJer the 

relations S* 2 3*1.23=0 .. . (1), and £*3.j*i. a2 (2) (by 3rd iroperty). 

From (1), we have 

2x 2 . 3 (*r-b 12 a*2 — b ia 2 x 3 )=0 
or Expels— bi 2 8S* 2 * 2 .8 — b l3 . 2 S* 3 x 2 . a =0 

or Sxj.aJfi.3— bi,.8S*a-3*2 8 — 0=0, (by 2nd, 1st properties) 

8*2-3 ^.COV^.a.Wy.a) 


or 


■’IS'** 


.or 


var(x .3) 


bl 2 3=r 12 .3 Oj a/*8'3* 


(i) 
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Similarly from(2), we have 

u _ s *i-s*a-2 cov(x x . 2 ,X3. 2 ) 

b *- 2 - s*vr ,or 

• • b X 3. 2 =r X 8. 2 <t x . 2 /<t 3 . 2 (ii) 

Thus from (i), (ii) we see that the two regression coefficients 
of order one each can be expressed in terms of a, and r, each of 
order one, i.e. in partial values of <r, and r, 

Similarly, it can be shown that b 23 . x = ^f 2 ' 1 — 1 1 , or C0V ^ , r ,JC »’ i) 
1 231 Sx 2 a . x var(x a . x ) 

, . . S^.-Xi.a COV(* 2 .a, x x . a ) . . , 

i.e. r 23 . x <j 2 . x /<j 3 .i ; and b 2X . 3 = .or - var(x x . 3 ) t,e ’ 2l> * ® 2 3 ' 

otj .3 etc. Generalizing this trivariate case to that of a p-variates, we 

can show that b 12 . a = ^ g^ - 2 - -, or ~ i e - r xa-a g i «/««•»> 

where q stands for the group of suffixes 3,4, .. . ,p. Thus we observe 
that a regression coefficient of order (p-2) is expressible in terms of 
a, and r„ each of order (p-2). 

We, therefore, arrive at the conclusion that a partial regression 
coefficient can be expressed in terms of s.ds. and correlation coeffi- 
cients of zero orders as well as of higher orders. 

Exp. (10). |a] Show that the coefficient of correlation lies between 
— 1 and +1. 

[b] For given r x2 , r 13 find the range of r 2 s and also comment on 
r X 2-a “ 0, 1. 

IcJ. (i) If r 12 =k, r 23 =— k, prove that — l<r xa <l— 2k*. 

(ii) If r 23 —0, prove that R 2 i< 23 )=r 2 X2 +r 2 X3 ; and cr* x . 23 = 
o 2 x (l — r* X2 — r 2 13 ). 

(iii) If r 23 =l, prove that r* X2 = r 2 13 ; and d 2 x . 22 =<r 2 x (l — r 2 X2 ). 
tol.[a] Let r be the correlation coefficient between x and y 

for a random sample of n pairs dtawn from a bivariate normal 
population. If x, § denote the sample means and <* y denote the 
sample s.ds. of the variables x and y, then by the property of sum 
of squares, we have 

s 1 =Lsfcf-sHn' a ,o 

2n e* °v \ 

i p*-*) 2 %-?) 2 ,S(*-i)(yH/n 
or n~^ + ~nc \ — 2 “5^ r° 

1 \n*\ i.n« 2 v , cov(jc.y) ~|^ A 
01 2 .nff 2 „ n<j 2 „ a x a v \ 

or |-[1+1— 2r]>0 

or l-r>0, .\r<+I • •• (1) 
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Similarly, by considering the quantity 

s,= k 2 [*0r + *w] a>0 ’ we havc 1+r>0 

,*.r>— 1 (2) 

Combining the two results obtained in (1), (2) we see that 
— I<r<+1. Thus the coefficient of correlation between any 
two variables always lies between —1 and + 1. Hence proved, 
(b] We know that 


_ r 12 r l3 r 28 

12 1 VlU — r* 13 )(l — r* 23 )] 

( r l* — r l3 r 2j)* 


<1 


or .2 _ VM ^ 1 

or r w,— (i_r* a Kl-r*«) <1 

i e. (r lt — rurjj,) 2 < (l-r 2 „)(l-r 2 23 ) 

or r 2 ^ 2ri 3 r 13 r 23 +(r , u 4 r* l3 — 1)^0 (1) 

If r u , r u are known, then the equ. (1) is analogous to the 

- V(b 2 - 4ac) where 


quadratic equ. ajc 2 +bx+c<0 giving x= 


2a 


a, b and c are known constants. 

Thus from (l), we have 

_ _ 2r ia r 13 jbV[4r 8 12 r 2 i 8 — 4(r® ia +r a 13 — 1)] 
r a - 2 

or r u —Ttf u ± V(rW-r 2 u — r a t8 +l) (2) 

Thus the limits of r^ will be r ia r l3 ± Vi^^rSa— r a 12 — r 2 l3 +l) 
riving its range from [r 12 r 13 — x/(r 2 M r 2 l3 — r 2 12 — r* 13 +l)] to 
L r n r is + V (r*nr a i 3 — r a 13 — r a 13 + 1)]. 

Further we see that if r 12 . 3 =0, then y [( 1 - f a 2 3 ) ) =0 ’ 

which is only possible when r 12 =o, and r 13 (or r 23 )=o. It means that 
Xx (or * 2 ) is uncorielated with x 2 (or x t ) and x*. This gives us the 
necessary and sufficient{n&s) condition for the uncorrelated variables 
of the three regression planes for a trivariate normal distribution. 


Also, if r J2 . 3 =l, then 


T V> — r 13 r 23 


V[(l-r 2 18 )U-r 2 i8 )] 


=1 i.e. r a , 2 + 


r^+rV — 2r lt r 23 r 31 = 1 , which is only possible when the three planes 
coincide. This gives us the n & s condition for the coincidence of 
the three regression planes for a trivariate normal distribution. 
[c].(i) If r t2 , r 28 are known, then the limits of r l8 will be : 
riaTaai r *n— ***»+ 1 )> as obvious from (2) of [b]. 

or k(— k)±V(k 4 — k 2 — k*+I) (since r^k, r a >— k) 

.or — k 2 ±>/(l — k 2 ) 2 ; or — k 2 ±(l — k 2 ) 
i.e. — k* — (1— k*) and — k*+(l-k 2 ) ; or —1 and 1— 2k a , 

— l<r 18 <l— 2k 2 , . Hence proved. 
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(ii) Since we know that Ri( 2 ») and r are related by the equation 

P2 1 ' 2 ia~)~ r> ia~*^ r i2 r 23 1 'ai 

1C23I ** 1 r a 9 

1 — r 28 

so R*i(23)=r 2 12 +r 2 18 , when r 23 =0. 

Also, 1 — R , ](2s)=*l — (r 2 i2+r 2 ia) ; or — t* M r* w ) 

® 2 ias=® 2 i(l— r\a— r a w). Hence proved. 

(iii) As we know that 

R 2 i(28) ( 1 — r2 a») =r2 i»"i* r2 i3 2r 12 r 23 r 31 , 
so 0=r 2 x2+r 4 u — 2r 12 r 31 , when r 23 = 1. 
or 0=(r 12 — r 13 ) 2 

r l2 =r 13 , and consequently r 2 ia =r 2 ia . 

Aliter : Since we know that 
r 2 3=ri 2 r i3±y'(r 2 12 r 2 i3 r 2 12 r 2 13 + 1) , 
so (r^— r ia r ia ) 2 =(r 2 ia r* 13 — r 2 12 — r 2 u + 1) 
or (1— r 12 r I8 ) 2 =(r 2 12 r , 18 — r 2 12 — r 2 l8 +l) when ^=1. 

or (r ia — r 13 ) 2 =0, i.e. r 12 =r is . Hence proved. 

Further, as we know that 

°V 23 == ° 2 i (1 — r2 ia)( 1 — r2 i 3 -a) 

__ a S M r * \ft ( r l3 r 12 r 32) 2 1 

o 2 1 . 23 =(T a 1 (l-r 2 1 2), when r 2a =l. Hence proved. 

Exp. ( 11 ). la] Show that R 2 l(23 , = > r 2 . 

1 r 23 

Hence or otherwise establish the result 

(1 — r 2 w)(l r * i3. 2 )(l r 2 i 4 .as) (1 r s i 1 ) . 2 s... J ,_ 1 )= 1 R*i( 28 -..p). 

lb] Prove that R 2 1 ( 2 3)=b 12 . a r ia — +b l3 . 2 r l8 

ff i » <*1 

lei (i) Show that b 12 .a=(bi a — b^bsa)/^— b 23 ba a ). 

/::\ r 5 l l_ II l / r 13 r i3 r 23 \ 

00 „ „ r i 3‘3 a — 1— r*2S / * 

(iii) „ „ b 1 2 . 3 b 2 3 .ib 8 j.a ==r ia , 3 r 23 .i r 8 i*a * 

Sol. [a] We know that 

j ,r j A_ _ An+ri 2 Aia+ r i 3 Aia 

An An 

(1 — r 2 2 s)-fri 2 (r 3L r a2 — r t2 )4~ri3 ( r ia r 23 r ai) 

(i — r 2 aa) 

1 — r 2 j3 — r* lt — r* 1 a+2r w r < si'»i /1 \ 

W 


* r 2 ia 4~r 2 i3 — 2r 12 r 8 ara 1 

-1 “ 1 — r 2 a a 


R 2 l(l8) 


rSs+^it — 2r 12 r2arai 


( 2 ) 
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'Further by adding and subtracting the quantity rV» M in the 
numerator of (1), we have 

I 1 r a n+r a igr*83 — r 2 12 t 2 2a— r 2 i8-j~2r 12 f2arai 

_ (1 r* ia)(l~~t*2») (r~i2* 2 2»-j-r*i» 2r lg r 2 a raj) 

1— r a 2 a 

_ (l-r* 12 ) (1 — r 2 12 )( 1— r a 8 a)— Oia— r 12 r 2 a) 2 

(l-r* 12 ) (l-r» M ) 

/i -2 \f 0 f 2 u)(l — r 2 2 ») (fra— r u r 2 a) a 1 

1 la l(l-r 2 aa )(l-r 2 £ a) U-i 2 i*)(l-r 2 2 »)J 

.M-R 2 l( 2 a,=(l-r 2 1 »)(l-r 2 1 a. a ) (3) 

Thus from (3), it is obvious that 1 — R 2 i< ; 8)<l — r 2 12 ; /.e. R^sj^r 2 ^ 

or R a i( 2 a)^r a (4) 

Combining the two results obtained in (2), (4) we see that 

RVs,= >r » Hence proved . 

1 — r 2 3 

Again considering the relation (3), we have 
a *i‘2t/ aa i == l~~ R 2 i( 2 s)=( 1 r 2 i 2 )(l r a 1 a. 2 ). 

Generalizing this trivariate case to that of a p-variates, we have 

° 2 i-23---j)/ <I * i = G — f2 i a )(l r 2 ia a )( 1 r a 14 , 2 a) (1 r a lf> . 2 a 

1— R a i( a a. • • »). Hen ce the result. 

[b] Since we know that 

r 2 = j_ _A _ ^ An-ffr a Aiij+fiaAia 

1 ■** An * An 

-*(=&) + M=&) 

= r 12 |^-b 2.3 j + ^13 • since b 12 , 2 = ~ 


SO R 2 i(23) 

[c]. (i) We know that 

t. ~ CT 1 ,Al2 

°12-3 — 


bi 2 * 3 **i 2 ~ + b J 2 . 2 r 13 


Hence proved* 




^i.x^r 2 ,+b 12 

°3 q 2 


An 


Z&L*l 


“ r ia - 


— Oi , 0's 

* r i 8 r 2 S+ rr r i 2 




1 — b 2 ab 82 
bi 3 bsa+b lg 


b 12 .| 


1 — b 2 tbtj 1 b 2 »b M 

» (b 12 — b xa b82)/(l— b^bsj). Hence proved* 
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(ii) We have 

f u«a * = hij.a 
. a *-a 

—■ g l A» a i ( r ! 3 r 23 — f n) 

a 2 All (1 , 2 2a) 

.•.r 12 .3^=^( ri , 2 ~;f 23 ) . Hence proved. 

<* 2*3 a 2 \ ‘T 23 / 

(iii) We know that 

U h h - C0v (■*! *’ ^8-3)s,COV (x 2 . u Ars.j) w COV(jf,. t ,X 1 .,) 
var(x 2 . 3 ) var(x 3 1) var(xv 2 ) 

-a r l 2 - 3 <y l- 3 q 2‘ 3 ^ r 23 -l (y 2 , l qr 3 -l ^ r 3 l’ 2 (T 3 < 2 (T 3»2 
(j2 a 3 ° 2 i 2 


^ j r 12‘3 <y i«3 ^ r 23-i q 21 y i q 3-2 
^8 °3 i ^1-2 

_ gfy/Cl— r 2 i 3 ) < W 0 -r 2 2 ,) vr q 3 -y/(l-r*, 2 ) 

12 ' 3 q 2 V(l-r\a) 231 o 3 v'(l - i* 3l ) 31 2 <W(l-r\ 2 ) 

• • bj 2 . 3 b 2 3.jb 3 j . 2 — r 12 . 3 r 23 .,r 31 2 . Hence proved. 

Exp. (12). [a] Show that the values : r 12 =0 , 6 1 r 23 = 0*8, and 
rsi== — 0*5 are inconsistent. 

|b] In a sample of 28 from a trivariate normal population, it has 
been found that ri 2 = - 8, r 13 = — -4 and r 23 — — 56. Compute r 12 . 3 , Ri( 23 )> 
and test their significance. 

[c] From independent samples of 31 and 22 sets of values, 
partial correlations of order three are found to be ‘4 and ’6 respe- 
ctively. Examine 

0) whether the first sample could have, come from a popul- 
ation with the corresponding correlation coefficient of 0 - 6; 

(ii) whether the two samples could have come from the same 
normal population. 

Sol. la] If a partial correlation coefficient computed from 
the given values r 12 =0'6, r 23 =0'8, and r 31 =— 0 5, lies between— 1 
and +1, then only the values are said to be consi-tent, otherwise 
inconsistent. Let us compute, for the purpose, the partial correlation 
coefficient r la . s as follows — 


• r !2 r ia r 23 

12,3 V[(l-r 2 i*)(l-tV] 

_ 0 6 -(— 0-5)(0-8) _ 0-60+0-40 _ 10 

" VKl- ‘25X1 - 04)] Vtt0*7 5 X0 36)] 5190 >1, 

hence inadmissible. 

Conclusion : The given values are inconsisten*. 


[b] We have r u . a 


r ii~ r i3 r 83 0 8— (— -4)(— -56) 

VKl-r^ll-r 2 *.)] V[(l--16)(l - 31)3 


«=O -70 
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and R\ {ti) = r *^a+ r *i3 — 2r ia r 2 3rr.t _ *64+*16-2(*8)(-*4)(-*56) 

1 — r s 2# 1— *31 

=0*64, H](g) s> 8. 

For testing the significance of r ia . a (=0*76), we take the hyp. 
(Ho) : P=0. 

Here we compute the statistic 

t= 

0*76 _ 0*70x5 

“ Vt{l— (■76) 2 >/(28— 1— 2)] *4224 

Now we have 1 1 1 =8*99 and t. 05 < 25)=2*06. 

So 1 1 |>t. 0 5 leading to the rejection of hyp. at 5% level. 
Similarly for testing the significance of Ri( sa )=*80, we take 

the hyp. (Ho) : R=0. 

Here we compute the statistic 
r R 2 n-p—1 
F “l-R* ' p~ 

0*64 28—2—1 _0*64x 25 „ 0o 

“1-0-64' ‘ 2 0*36x2 ~ 22 2i * 


Now we have F=22*22 and F. 05 (2,25) =3 *38. 

So F>F.o a leading to the rejection of hyp. at 5% level. 
Conclusion : The computed values of both r 12 . 3 and R^) are 
si gnifi cant at 5% level of significance. 

[«]. (i) Here we are given : r=0*4, k=3, n=31; and want to 
test the hyn. (Ho) : P=0*6. 

Thus we compute the statistic 


z — $ 


l/V(n-k-3) 


•42— *69 


, where z=l 


*27 


15131og 10 ([i-[j=*42, and 


$=1*1513 


*69. 


l/V(3l-3-3) 1 


X5— 1*35. 


So |Z| <1.96 giving no evidence against the hyp. at 5% level. 
Conclusion : The first sample with observed partial correl- 
ation coefficient of 0*4 may be supposed as drawn from a population 
with corresponding correlation coefficient of 0=6. 

(ii) Here we have been given : ri=0*4 ni—31, r a =O*0, n a =22, 
k—3, and want to test the hyp. (Ho) : Px=P 3 > 

Thus we compute the statistic 
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Zl-Z, 


4( n i— k ~ + Da— k — 3 ) 


, where z x « 


•42— *69 


—•27 


■11513 log,, (|±S) 
‘■42, and Zj=*69 
-6-4 


4(31-3-3+22-3-3) 4(25 + ie) 


6-4 


=-•84 


So |Z| <1’96 giving no evidence against the hyp. at 6% level. 

Conclusion : The two independent samples with observed 
partial correlation coefficients of 0*4 and 0*6 may be regarded as 
drawn from the same normal population. 

Exp (13) Given that x t =8 , y 2 =5, *,=4, r 12 =0 - 86, r 23 =0-72, 
r 31 =0'65, CTj^O' 1204, <x 2 =0i30fl, and <j 3 =0*154. Set out the multiple 
regression equation of on x 2 , x 3 and find x l for y 2 =8, *3=6. Also 
find the s.e. of estimate of y x . 

Sol. The multiple regression equation of y x on jc 2 , xj is 




An 


X 2 —X 2 


A 13 


*3-5 

o.. 


Ai3~ O 


»' M2<jt (0 ' 4816 > 


HW(- 0 ' 3920 )+ra4o'- 0 ' 0308 >“ 0 

or U 1 .-8)(4)+(x 2 -5X-3)+U 3 4)(-0*2)=0 

since Au=(l— r z 2B )= , 4816 
Ai2=(r 81 r S2 -ri 2 )=--392 
Au= (r i 2 r 2 a— r 8l )= - ’0308 


{ 


or 4 y x ,— 32— 3*3+15— O^+O^* 
i.e. * xe — 0 , ’5* 2 +0‘O5* 3 +4 , 05. 
Also, x w =0-75(8)+0 05(6)+4-05 
/. jc w =10 35. 

Now we have 


Hence the required equation, 
when *2=8 ,.ys= 6. 

Hence tbe required estimate. 


=0*1204 


r- 


48164-0- 86( — 0* 3920) + 0‘ 65( —0 0308) ] 1/!l 
0-4816 J 


=0-1204^-j|J 1/2 =0’1204(0'25) 1 /*=-1204x-6 
/.S.E (x l )=0-0602. 


Hence tbe result. 
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EXERCISE VII 

1. A research worker observed the following fresh and dry 
weights in ounces for a sample of his experimental material — 

Fresh wts... 8 6 10 5 12 2 20 15 14 18 

Dry wts ... 3 2 2 2 4 t 5 4 3 4. 

[a] Calculate the correlation between tie two characters. 

[b] Represent the data in a scat er diagram. (M.Sc. Ag AU^1963) 

2. Find the two regression lines and also plot them on a graph 
to give an idea of correlation between x and y for the following — 

x. . 25 27 26 29 34 36 

y. .. 2 6 7 9 19 17. 

3. The following data are for the amount of water applied in 
inches and yield of alfalfa in tons/acre— 

Water (jc) ..12 18 24 30 36 42 48 

Yield (y) -5-27 5 68 6 25 7 21 8’20 8 67 8‘42. Find the 

regression of yield on water. Assuming that the relation between the 
two is linear, find the expected yield when the amount of water 
applied is 20'. 

4. The following table gives the temperature of unhusked rice 
and the % breakage of rice grain in milling — 

Temp, (degree) .. '33 34 29 36 38 28 29 36 34 30 

% breakage ... 37 26 26 27 30 24 25 28 30 24. 

Calculate the coefficient of correlation between the temperature 
and % breakage, and also test its significance. (M.Sc.Ag. AU, 1961) 

5. Define the coefficient of correlation and find its value 
between x and y for thfe data given below— 

X... 8 10 15 17 20 22 24 25 

y... 25 30 32 35 37 40 42 45. 

6. What do you understand by perfect positive anl perfect 
negative correlations ? Is a significant test needed for this value ? 
The correlation coefficient between days to head and days to mature 
for varieties was found to be 0‘4, is this significant ? 

7. The following table enumerates tne marks obtained by a 
c lass of students in Statistics in 1st and 2nd papers — 

Marks in 1st (x) . 80 45 55 66 58 60 65 68 70 75 85 

„ „ 2nd (y)...82 66 50 - 48 60 62 64 65 70 74 90. 

Calculate the two regression coefficients and hence or otherwise 
compute the correlation coefficient between x and y. 

8. A sample of paired variates is given below— 
xi 9877653311 

y: 99868643 1 1. 
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[a] Calculate r between x and y, and interpret the same. 

[bj Illustrate the scatier of the points in a diagram. 

(M. Sc. Ag AU, 1965) 

9. From the regression lines : lOy— 16*— 21=0 

and 2y— 6*+16=*0. 

[a] Find the mean - of x and y. 

[b] Compute the coefficient of correlation between x and y. 

[c] Plot the lines on a graph and give an idea of correlation. 
Also, obtain the means * and y from the graph and compare them 
with those of obtained in [a] 

10 Fcr six pairs of observations of x and y, the following 
deviated values were calculated— 

SP=— l,L^=229,S7)=— 4,Sy) 4 =92, and 2^=139, where 
stand for the deviations of x,y respectively from the corresponding; 
assumed means. 

Compute r and b Vx ; and also test their significance. 

11. [a] In a sample of 20 pairs of values, total solids and 

fat contents gave the correlation coefficient of O’ 6. What would 
you conclude from this value ? (M. Sc. Ag. AU, 1959) 

[b] The thickness of 20 annual rings of a tree and the corres- 
ponding annual rainfall were found to be correlated with a 
coefficient of +0’47. Is this correlation significant ? 

[c] For two characters x and y, it is found that : r= -|-0’8, 
*=25, y=22, o x = 4 and cr„=5. 

Calculate (i) the expected value of * for y— 12, and 

(ii) the expected value of y for *=33 (M. Sc. Ag. AU, 1962) 

12. [a] The ranks of 15 participants in a bfcauty contest graded 
by two judges were as follows — 

(6,8);l7,3);(8,l);(9,ll);(10,l*);(ll,9);(12,6),(l3,14);(14 l 12);(15,13) 
n»10);(2,7;;(3,2);(4,6);(6,4). Show that the two judges do not differ 
in their opinions. 

[b] The ranks of the same 16 students in two subjects A and 
B were as follows — 

(l,U,(2,10) f (3.3),(*,4),(5,5),(6 f 7),(7.2X(8,6),(9,8)(10,ll) > (ll t 15), 
(i2, 9), (13, 14), (14, 12), (15, 1 6), (l6, 13). Calcuhte the rank correlation 
for proficiencies of this group in subject A and B. 

13. [a] The following marks, out of 100, have been obtained 
by a class of ten students in Statistic;— 

Student ... I II III IV V VI VII VIII IX X 

Paper I ... 78 60 38 75 45 25 85 89 62 32 

Paper II... 45 65 34 35 82 76 72 30 56 42. Compute 

the coefficient of rank correlation and comment on the proficien- 
cies of the class in the two papers. 
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[b] Ten competitors in a beauty contest are ranked by three 
judges in the following orders— 

1st judge... 1 66 10 324978 

2nd „ 3 6 8 4 7 10 2 1 6 9 

3rd „ ... 6 4 9 8 1 2 3 10 6 7 Use the rank 

correlation coefficient to discuss which pair of judges have the 

nearest approach to common tastes in beauty. 

14. In a sample of 20 sets of values of three mutually depen- 
dent variables x lt a 2 and Aa, the following data are available— 

x 1 =20"j <r 1 =2 - 0“ r 12 =0'4 "J 

x 2 =15 ,o 2 =1-5 , and r 2 a =-0'2 . Calculate(i)any three partial 
*3=18 o s =3 0 r 3 i= 0'3 regression coefficients, (ii) the 

multiple correlation coefficient of a x on a 2 . a 3 ; and also lest its 
significance; (iii) the estimate of x 3 for a 3 =12,a, — 15; and th- s.e. of 
the estimate. 

15. In an experimental study on Cinchona plants, the follo- 
wing quantities were obtained f 

Si-21-68“1 <j 1 =14-26"l r 1? =- 
x 2 —IGQ'4 , a 2 =66"67 ,r 2 a=- 
Xj = 3‘14 cr a =l’03 r 3l =- 

of plants 6' above the ground (inches). Calcu’at: (i) r 12 . 3 , b 12 3 and 
R 2 (is). (ii) Set out the multiple regression equa-ion of a, on x 2 , x 3 ; 
and. find the most likely value of x l when x 2 = 150 an 1 X 3 = 4 . Also 
find the s.e. of the estimate. 


rom a sample of 32 plants — 

367“| , where a, represents the yield 
321 of bark (oz.), X, the height of 
684 plants ( inches ) and A 3 the girth 


ANSWERS 

(1) r= + 0-91 (.2) y-l*65x-35 63; x=0t>Qy 1-23*43; 'imi id 
(-f)ve correlation (3) t/=0-1035 A-f-3'995, y«= 8 - i 35 tons/atre (4) 
r=+0-49, t= 1*48 (5) r=+0-98 ( 6 ) r=±l,no, t= . 23 (7) ^*=0-99, 
h aV =0-85, r=+0’92 ( 8 ) r= +0 98, high (+) ve; (9). [a] x=ir22, 
y=20-05 [b] r-+0 a 8 (10) r=+096, r =7-38 6 Va =0-60,.>=7-5 (11). 
[a] /=3*18 [b] r=2-26 [c]. (i) x„=18'60 (ii) y e =30 ( 12 ). |a|r=+0-51, 
the positive value of rank correlation coefficient shows that the two 
judges do rot differ in their opinions, [b] r=+0 - 8 (13). [a] r=— 0"3, 
the standards of ability of the students are different in the two 
papers [b] ria=+0"7,r 13 =0=r 23 thus the last two pairs of judges 
have the least disparity in their approach to beauty. (14). (i) 
^ia-a“=0’674, h 23 . 1 =0-198, A 13 . 2 =-0 234, (ii) R lte) =0-J12,F=3-14, 
(iii) Xw—lS’ld, e a . ia =2-66 (16) (i) r 12 . 3 =0*2134, h 12 . 3 =0-041 , 
Rj(i,)=0-38 (ii) x w =0 0413a 2 + 8 733 a 3 -12-6139, x» =28-5131, 
bg.gg— 10*15. 



Chapter VIII 

Sampling Techniques 

8.1 Introduction : Before designing any plan, we essentially 
require some quantitative data which can be obtained through an 
enquiry or survey. This survey may be of two types-(a) census survey, 
and (b) sample survey. A census survey is one in which all the units 
connected with the problem of investigation are taken into account, 
while in a sample survey only some selected units are considered. 

8.1 (a) Census survey : Here the conclusions regarding the 
population are based on the complete enumeration of the whole 
experimental material and hence the technique requires a lot of time 
expenditure and other necessary resources. Thus the use of the 
technique is limited and it can be applied only to the problems 
related to some special fields of enquiry such as the census count, 
production, imports and exports etc. of certain principal 
commodities. 

Below we give some specific situations in whifh the technique 
can successfully be employed — 

(i) When the population or field of investigation is limited. 

(ii) When the results are needed with maximum possible accuracy 
and reliability. 

(iii) When the nature of the units under investigation is lessdynamic, 

(iv) When the selection of a sample is difficult. 

(v) When the sufficient money, time and other resources are 
available. 

(a-I) Limitations of a census survey : 

(i) It cannot be applied to an infinite population, or a 
popu'ation which is not completely known. 

(ii) It does not give us a method of knowing the precision of 
the results drawn. 

(iii) It cannot be used in the situations wherein the units to 
be collected have the considerable variation among themselves for 
the character under study. 

(iv) It cannot be applied to the problems in which the 
sampling is easy, or the tudy is destructive. 

(v) It requires a lot of money, time, n anpower and other 
resources. 
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8.1 (b) Sample survey : Here the conclusions regarding the 
parent population are based on the results obtained only from a 
few selected units of the population and hence the technique saves 
a considerable time, expenditure and other necessary resources. 
Thus the technique has a greater scope and a vast-field of applica- 
tion to different types of enquiries such as the estimation of yield, 
area and income etc. of certain crops. 

Below we give some specific situations in which the f technique 
can succeesfully be employed— 

(i) When the population or field of investigation is infinite, or 
the population is not completely known. 

(ii) When a character is to be estimated wi h some sped ed 
precision. 

(iii) When the units to be collected have a considerable 
variation among themselves for the character under study. 

(iv) When the comple'e enumeration is not possible, or the 
study is destructive. 

(v) Wh. n the time, money and other resources are limited. 

(b-1) Advantages of a sample survey : The main advantages 

of a sample survey in comparison to a census survey are as follows- 

(i) Greater. adoptability : In most of the cases where the 
population is infinite, or the whole of the population material is 
not known, or a destructive test is applied to measure the character 
under study, then it is impossible to collect tne informa' ion regar- 
ding the whole population. Thus in all such cases, a sample survey 
is the only scientific method to be adopted for the purpose. Also 
there is no alternative except sampling to knov the precision of 
the results obtained. 

(ii) Greater scope : In some enquiries where an intensive 
study is to be made and the highly trained personnel or the speciali- 
zed equipments are scarcely available, a sample survey gives the 
more scope of enquiries than with a census survey. 

(iii) Greater speed i The data can be collected and inter- 
preted mote quickly with a sample than with a census. 

(iv) Greater economy : The study of the population through 
a sample saves a considerable amount of cost, as only a small 
fraction of the aggregate is considered here. 

(v) Greater accuracy : By employing the trained personnel 
that are always available only in a limited number, the results of 
a sample survey may be more accurate than a census survey. 

<b-2) Limitations of a sample survey : 

(i) Inspite of the fact that a proper method of selection is 
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employed, a sample does not truely represent its parent population 
and consequently the results are less reliable. 

(ii) The problems of choosing a proper sampling unit, size and 
technique are too difficult and technical. 

(iii) There is always a possibility of the inclusion of a human bias. 

(iv) It requires a good background of mathematics needed for 
computing the estimates and their s.es. 

(b-3) Main steps of a sample survey : The requisite steps in 
designing a sample survey are as follows— • 

(i) A clear statement of the objective of survey. 

(ii) The definition and size of the parent population to be sampled. 

(iii) The choice of a measuring device. 

(iv) „ „ „ „ sampling unit. 

(v) „ „ „ „ „ technique. 

(vi) „ „ „ sample size. 

(vii) The plan and organization of the field work. 

( viii) The descriptioi s of the expenditure, time and other resources 
to be utilized in the survey. 

(ix) The summary and analysis of the collected data. 

(x) The information availed for future survey. 

8.2 Technical terms of a sample survey ' v Before any further 
dbcussion of the sampling theory, we think it more illustrative and 
advantageous to a beginner to define precisely some main technical 
terms that are commonly used in the sampling theory. 

(i) Population : In statistics, the whole field of investigation 
is colled a population or universe. It may als*o be termed as the 
collection of individuals or of their attributes numerically specified. 
Further, a population composed of real individuals is called an 
existent population, while an aggregate of all f. asible ways in which 
an event can happen is called a hypothetical population. The total 
no. of plants in a field is an example of a teal or existent population, 
while the total no. of heads assumed to be turned up in tossing 
up a coin to a large no. of times is an example of a hypothetical 
population. 

(iii Population member : Each of the Individuals of mutually 
exclusive and exhaustive set of individuals comprising the whole 
popula.ion is called a member of the population. For example, a 
field (or a plant) is a member of the population of fields (or plants) 
under investigation. 
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(iii) Population size : The total number of members or 
sampling units contained in a population is called the population 
size. Depending on '.he no. of members in the population, a popul- 
ation may be of two types --finite and Infinite. If a population 
contains only a finite no. of members in it, we call it a finite 
population. For ex i mple, the population of inhabitants of India, 
the population of books in a library, and the population of ^ udents 
in Meerut University etc. are the finite populations. On the other 
hand, a population with an infinite no. of numbers contained in it 
is characterized as in infinite p ipuhtion. F almost all practical 
purposes, a p > mlation consisting of a I irgc no. of members may 
be considered as infinite. The population of pressures ai various 
points in the atmosphere, and the population of throws of a die are 
the examples of an infinite population In fact, the discussion of a 
finite population is more difficult than that of an infinite. Sometimes 
it is also difficult to ascertain whether a population is finite or 
infinite. The population of stars in an example of this type. 

(iv) Parameter : The population characteristic is called 
the parameter. For example, the mean and the variance of a normal 
population are its two parameters. 

(v) Sample : A selected nn nbtr of population members is 
called the sample. It may also be termed as a representative pirt 
of the parent population which is selected for drawing conclusions 
regarding the population with a speeified degree of confidence. In 
most of the problems, it is not practically feasible that every 
member of the population may be examined, but only a part of it. 
For example, if we are interested in finding the average yield of a 
particular crop in a certain locality, it is not po sible rather advisable 
to take every field of that crop into c msiderat on Similarly, if one 
inquires into the average heights of the humtn population of India, 
he cannot afford the time and expenditure required to m?a ure the 
height of each individual. In all such cases, an investigator examines 
only a limited number of individuals or units of the parent popul- 
ation and expects that these selected units truely represent the 
whole population. 

(vi) Sampling unit : A population member itself or an aggre- 
gate of it selected for the sample is called a sampling unit. For 
example, either an individual or a household, or a village etc. may 
be taken as the sampling unit in a survey on human population, 

(vii) Sample size : The number of sampling units selected 
from a given population is called the sample size. As a matter of 
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fact, the problem of choosing an appropriate size of the sample 
involves some technical difficulties. Since too large a sample size 
though assures more reliable results owing to the reduction in the 
sampling standard error of the estimate, but at the same time it 
raises the cost of survey and requires a considerable time and other 
resources. On the other hand, too short a sample size though 
reduces the cost, time and resources of survey, but the results 
obtained therefrom are less reliable since the sample is rarely a true 
representative of its parent population Thus the middle course to 
be adopted to choose a moderate sample size is subject to several 
conditions of specified precision, cost and technique of sampling 
used. Sometimes the size is selected so as to give the minimum 
variance or the maximum precision of the estimate within the 
specified cost of survey, while in some other cases it is chosen to 
give the specified precision cf the estimate at the minimum possible 
cosL Hence, the problem of determining ail optimjm sample size 
is undoubtedly a quite difficult task v\ hose discussion is beyond 
the scope of this book. 

(viii) Estimate : The value of a population parameter 
computed from the sample is called the estimate . For example, if a 
random sample of n observations in a: is drawn from a normal 
population with mean p and variance a 2 , then the sample mean 
jc(=SA'/n) and the sample variance s 2 [=2(*— xffn— 1)] are said to 
be the respective estimates of the population parameters p and cr 2 . 
Further, an estimate is said to be the best if its average value is 
equal to the parameter value and its s.e. is minimum, i.e. a best 
estimate must satisfy the properties of unbus&dness and minimum 
variance. An estimate, sometimes, is al<o kmwn as a statistic. 

Thus an estimate or statistic is a suitably chosen function of 
sample observations. The value of this sta istic is computed for a 
number of samples each of tne same siza and drawn from the same 
universe. But the values of the estimate thus obtained ire usually 
different due to the fluctuations of random sampling. Thi> series 
of different values under certain conditions follow^ some definite 
statistical frequency distribution. If the no. of samples drawn be 
larger and larger, this frequency distribution tends to a continuous 
distribution, most probably a normil one. This distribution of the 
sample estimate is called the sampling distribution . For an unbiased 
estimate, the mean of this distribution is always equal to the 
corresponding population parameter value. Also the s.d. of this 
distribution is called the s e. of tne estimate. If this error for an 
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unbiased estimate is minimum, then the chosen sample is supposed 
to be a true repre entative of its parent population. 

(ix'> Sampling error : The error that occurs in a sample 
result is termed as the sampling error. In fact, however great care 
we take in selecting an appropriate size of the sample and also 
the technique of sampling, some random sampling errors are 
inevitable in the sample results. The average magnitude of these 
random sampling errors known as the s e. of the estimate /depends 
upon the no. of units chosen in the sample (n), the original 
varia -nitty in the material of the population (a 2 ), the sampling 
technique employed and the method of estimati m used. It is 
obvious that the var ability in the population material is beyond 
the control of an investigator.As regards the sample size it is known 
that the s. e. of an estimate is inversely proportional to the 
square root of the no of units in the sample. Thus the amount of 
s.e. of an estimate can be minimized only by the refinement of 
the techniques of sampling and estimation, and also by choosing 
an appropriate sample size subject to the condition of time, cost 
and precision as desired. 

(*) Sampling frame : The description of the available inform- 
ation on all the sampling units in the population is called the 
sampling f-ame. A sampling frame identifies the sanpling units 
clearly and accurately. 

(xi) Sampling fraction : The ratio of the size of a sample 
to that of its parent population is called the sampling fraction or 
sampling ratio. For example, if a sample of n uuits is selected from 
a population of N units, then the ratio n/N is termed as the 
sampling fraction. If a sampling fraction is very low, r.e. n is very 
small relative to hi. then a quan ity (l-n/N) or (N-n)/N is called 
the finite population correct io i factor ( fp-f). I n practice, a fpcf can 
be ignored whenever the sampling frac.ion djes not exceed 6%, or 
even 10% in some of .he problems. Thus if a sampling fraction is 
too low or a fpcf is close to unity, then the population size as such 
has no direct effect on the s.e. of an estimate. But in fact, the 
efTect of ignoring a fpcf from the expression of the s.e. of an 
estimate is to ove estimate the s.e. concerned. Also tie inverse 
of a sampling fraction, i.e. the ratio N In, is called the raising factor 
or the expansion (inflation) factor. 

(xii) Sampling : The process of selecting a sample from the 
given population is called the sampling. In a true sense, the sampling 
must be an unbiased and representative one. It cearly nmea thats 
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the method of selecting a s imple should be such that an estimate 
furnished by the chosen sample must on an average hi equal to its 
true parameter value with minimum standard error. In practice, a 
good no. of sampling techniques is available in the theory of 
sampling. The sampling technique can however be varied according 
to the nature of the population to be sampled. Some of the sampling 
techniques are — random sampling, stratified sampling, systematic 
sampling, cluster sampling, multistage sampling, multiphase 
sampling, quota sampling, purposive sampling, probability sampling, 
balanced sampling, systematic area sampling, sequent al sampling, 
and ratio, regression sampling. 

8.2 Purpose of sampling theory : The purpose of sampling 
theory is to make sampling more efficient so that we may get a 
reliable and as much informa ion as possible regarding the parent 
population. It attempts to develope the procedures of sample 
selection and of estimation that provide, at the minimum possible 
(or fixed) cost, the estimate' which are precise enough for the 
purpose of enquiry at hand. Thus, how for the sample represents 
the population and how to select a representa'ive sample are the 
questions that the theory of sampling a' tempts to answer. The 
main principle adopted in the theory of sampling is the logic 
induction where we move from a praticular situation to general. 
Knowing the form of the parent population, we usually intend to 
estimate the parameters or specify the limits within which the true 
parameters are expected to lie with a specified degree of certainty. 
It is, however, to be clearly understood thaS all sampling results 
are expressed in terms of probability. 

8.3 Role of sampling theory : It attempts to get the sample 
estimates as pr- cise as pO'Siblc (or of specified precision) within 
the specified cost (or at the lowest possible cost) and limited 
resources at hand. The princip'e of specified precision at the 
minimum cost recurs repeatedlv in the presentation of sampling 
theory. In any specific situation, we cannot account in advance the 
exact amount of precision since we do not know readily how large 
a sampling error will be present in the sample es imate. Instead, 
the precision of a sampling technique used is determined by the 
frequency distribution generated by the estimate provided the 
technique is repeate/dly applied to the same . population. 

A considerable part of. sampling theory dials with the com- 
putations of various form >lae for determining the sample variances 
of the estimates that are obtained by employing the different 
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sampling techniques. For samples of moderate sizes that are 
common in practice, there is often gool ground to assume that the 
sample estimates are approximately normally distributed. The 
sample variance of an estimate gives a basis for the measure of 
precision of the estimate in inverse terms, since the precision is 
inversely proportional to the variance of the estimate. The sampling 
theory applicable to sample surveys is quite recent and contains a 
good no. of new appreciable developments. The moderi) sampling 
theory is based on the finite populations whereas the older theory 
of sampling is confined to infinite populations. Further, the theory 
of sampling can be studied under two ca egories -(a) the sampling 
of attributes, and (b) the sampling of variables Tne former is based 
on the qualitative data while the later is concerned with that of 
quantitative. 

8.3 (a) The sampling of attributes : Here we are concerned 
only with the possession or non-possession of some specified 
qualitative characteristic (trait) or attribute by the units chosen in 
the sample. If a selected unit possesses the specified trait, it is known 
as the success while its non-possession of the 'ra t is called the 
failure. The most of the sampling theory is bised on the funda- 
mental assumption of random sampling which may or may not be 
a simple. By simple sampling we mean the rando > sampling where 
each event has the same probability of success (p) and its matena- 
lizat ; on is independent o'f the successes or failures of the preceding 
trials. Thus a simple sampling is necessarily a random sampling but a 
random sampling may not always be a 'imple sampling. This 
concept may be clear from the following example of sampling with, 
or without replacement. 

Suppose an urn contains 6 white and 4 black balls each of 
the same size and shape. Then the prob. of drawing a black ball at 
the first trial is 4/10, while at the 2nd trial is 3/9 provided the 
drawn ball is not replaced back to the urn. Hence obviously, the 
above two probabilities are different and the sampling though 
random is not simple. It means that if the previously drawn ball 
is replaced back to the urn before the next draw, then the proce- 
dure of drawing or sampling the ball is termed as sampling with 
replacement, while the contrary case is sampling without 
replacement. 

Further we note that if we deal with a finite population, the 
random sampling may or may not be a simple sampling according 
to the aspect of with or without replacement. But a random samp- 
ling from an infinite population may always be treated as a > imple 
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random sampling since the drawing of a unit does not materially 
affect the probability distribution. Also the selection of a simple 
sample of size m from a population of N units is equivalent t > a 
series of n independent trials with constant probability (p) of 
success, or q=(l—p) of failure. Tae probabilities of *=9,1,2, ...,n 
succes c es are the terms in the binomial expansion of (q + p) n , which 
gives us the sampling distribu ion of the no. of successes in the 
sample. The mean of this distribution is np and the s.d. isVUpq)- 
Similarly, the mean of the proportion of successes is p and their s.d. 
is V(pq/n)- Hence the precision of the proportion of successes in the 
sample, being the reciprocal of their s.d. or s.e., is V( n /pq) which 
is directly proportional to VrT since p and q are constants. 

8.3(b) The sampling of variables : Here we are concerned 
with the actual measurement on the units chosen in the sample for 
some specified quantitative characteristic or measurable value of a 
variable like the yield, or height, or weight etc. Each of the popu- 
lation members to be sampled gives its own measurement and all 
together they comprise some definite frequency distribution. A simple 
random sample of size n from a finite population of N units can 
be drawn in N C„ ways while an infinite no. of samples can be drawn 
from an infinite parent population. % 

Below we discusi in brief some of the techniques of selectign 
a sample from a given population. 

8 4 Random sampling : The method of selecting n units 
from a population consisting of N units is called random sampling 
(or strictly simple random sampling) if all N C« possible samples have 
an equal chance of • being chosen. The sampling is based on the 
fundamental assumption that the population is homogeneous 
with respect to the character under study. In practice, a simple 
random sample is drawn unit by unit and any previously drawn unit 
is not replaced back so that it may not repear in the sample more 
than one.-, this tjpe of sampling is called random sampling without 
replacement At any stage in the diaw, this method gives an equal 
chance of selection to all the units not previously drawn. It should 
not, however, be forgotten that a random selection does not mean a 
haphazard selection. Also a random sampling from a finite papu- 
lation with replacement is equivalent to sampling from an infinite 
Population without replacement. 

8.4.1 Method of selection : Certain methods of drawing a 
random sample that are convenient for small populations are not 
always suitable for large populations. The method of taking a 
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random sample depends to some extent on the size and nature 
of the population to be sampled. Thus the method of taking a 
random sample of 100 students from the population of students of 
a college may not be suitable in taking a Simple of 1 kg. flour 
from a sack of flour containing 100 kgs. of flour. Two methods of 
selecting a random sample-(a) sampling by lottery, and (b) sampling 
by random numbers, are described below. 

8.4.1(a) Sampling by lottery : In this method of sampling 
there corresponds a no: any of 1 to N, to every unit of the popu- 
lation to be sampled. These numbers are written on separate slips 
of paper of equal size and the lotolaty of these slips is mixed 
thoroughly in a bowl or estating drum. Finally a blind fold 
selection is made from the container, the sel-cted no. of slips, say 
n, being equal to the size of the sample and the nos. on them being 
the population units to be selected. The practical difficult) of the 
method lies in shuffling or mixing the miniative population. If the 
population to be sampled K large enough, the constriction of 
the miniature population is also large. Tnus the method can be 
suitable only in the case of small pop ilationt, since a thorough 
mixing of numbered slips is essential for complete elimination 
of bias. 

8.4.1(b) Sampling by random number* : To avoid some 
difficulties of lottery sampling, the random number tables are used 
to derive a sample from the specified population. Here as well every 
unit of the population is allotted a no; any of 1 to N. Instead of 
forming a miniative population we take any page gf random samp- 
ling numbers and note’the random numbers (n) equal to the size of 
the sample either in a horizontal or a veriical line (neglecting zerj, 
repeated numbers and an> no. greater tlan N). Though several 
persons have constructed the random no. tables but those of the 
following are commonly used. 

(i) Table* by Tippett : The random number tables (Random 
Sampling Numbers, Tracis For Computors. No, 15, published by 
Cambridge University Press) constructed by L.H.C. Tippett are 
most satisfactory. These tables consist of 10400 four digit nos. which 
are constructed out of 41600 digits taken from census reports by 
combining them in fours. Although these nos. were chosen hapha- 
zard, yet their application in numerous investigations has shown 
their truthfullness. 

(ii) Tables by Fisher & Yates : The random number tables 
(Statistical Tables For Biological, Agricultural and Medical Research 
workers, published by Oliver and Boyd) constructed by prof. R.A. 
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Fisher and F. Yates are of great importance in various statistical 
surveys. 

(iii) Tables by ISI, Calcutta : The random number tables 
constructed by the authorities of Indian Statistical Institute, 
Calcutta are also very common in statistical investigations. 

(iv) Tables by Kendall & Smith : The random number tables 
(Random Sampling Numbers, Tracts For Computors : No. 24, 
published by Cambridge University Press) constructed by prof. 
M.G. Kendall and B.B. Smith are very popular in numerous statisti- 
cal surveys. These nos. are obtained by using a randomizing 
machine, and are quite reliable. 

8.4 2 Assumptions of random sampling : 

(i) Each population unit has an equal chance of being chosen 
in the sample. 

(ii) Each possible sample has an equal chance of being chosen 
as a sample. 

(iii) The selection of the units is independent and free from 
any human bias. 

(iv) The population is homogeneous with regard to the 
character under study. 

8 4.3 Properties of random sampling : N 

(i) It is the easiest possible method of selecting a sample. 

(ii) It gives an equal chance to every population unit to be 
sampled. 

(iii) It does not require any extensive plan. 

(iv) It saves a considerable time, labour and cost of survey. 

(v) It gives a true representative of the *]population provided 
the sample size is suitably chosen. 

(vi) It gives the least chance to human bias. 

(vii) The precisions of its sample remits can easily be 
examined. 

8.4 4 Limitations of random sampling : 

(i) It cann 't be apphed to the situations where some popu- 
lation members are necessary to be selected in the sample. 

(ii) The method is not applicable to heterogeneous popuattious. 

(iii) It does not give the reliable results through small 
samples. 

Exp.(l) Select a random sample of siz 20 from a population 
of 8500 sticks. 

Sol : Let all the sticks of the population be numbered from 
1 to 8500 in some order. Now we consult a page of Tippett’s 
random number tables and select the first 20 nos. either row-wise 
or column-wise such that none of them is zero, or repeated, or 



126 


Agricultural Statistics 


greater than 8500. If the nos. obtained are : 3200, 3525, 3394, 1985, 
7693, 0011,3142,4625.7518.2472, 5182, 7844, 7780, 3362, 3857. 
6058, 4505, 1940, 7305 and 7947 then the sticks bearing these nos. 
would constitute the desired random sample of 20 sticks. 

Note: Though Tippett's random nos. we four digit nos. but 
in many cases they are less than 1000 or even 70, e.g. 0058. Thus 
even if the units ( N ) in the population ore less than 100 , ^ sample 
of any size (n<N) cm be drawn from these tables. In such cases 
some modified methods are available , but the discussion is omitted . 

8.5 Stratified sampling : The method of selecting n units 
from a population consisting of N units is called stratified sampling 
(or strictly sir at fie d rand >m sampling) if k simple random samples 
of sizes n 4 are independently drawn from k sub-populations of 
respective sizes such that n^T>n { and N=hN i for i=/,2...,fc. 
These sub-populations are non-overlapping, homogeneous within 
themselves and are called strata . The sampling is based on the 
fundamental assump ion that the population is markedly hetero- 
geneous with respec to the character under study. For example, 
the human popuU ion of a country may be stratified according to 
the age-groups, or social circumstances, or economic conditions 
etc. Thus in stratified sa> pling, the population of N units is first 
divided into k distinct strata of N^Ns.-.N^ units and then simple 
random samples of # ,n fc units respectively are drawn from 

these strata independently which all together comprise the whole 
sample so that n=n 1 +n 2 -f +n A . The technique of determining the 
sample sizes of the strata, where n is known, is called the problem 
of allocation. Sometimes the sample-sizes are taken proportional 
to the stratum->izes and someiimes proportional to the s within 
the strata Infict, the discussion of allocation is beyond the scope 
of this book. 

Below we give some situations in which the technique can 
successfully be employed - 

0) When the sampling problems are markedly different in 
different parts of the population. 

(ii) When several field offices are required for administrative 
convenience in supervising the survey work. 

(iii) When the results of specified precision are wanted for 
only a certain sub-division of the popul ition. 

(iv) When some gain imprecision in the estima es is de>ired 
o\er random sampling. 

Exp (2 ; If a population of 1000 units is classified into four 
groups consisting cf 100, 200, 300, and 400 units respectively with 
regard to some >pecified character. Outline the procedure of 
selecting a stratified sample of 20 units. 
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Sol : Let each of the four groups or strata be treated as a 
population in itself. Now using the method of random sampling, 
we can draw the simple samples of sizes 2, 4, 6 and 8 respectively 
from the strata of sizes 100, 200, 300 and 400. Thus the whole 

sample of size w(=2+4+6-f8)=20 is the desired stratified sample 
from the given population. 

8.6 Systematic sampling : The method of selecting n units 
from a population consisting of N (~nk) units is called systematic 
sampling if the first unit ( <k ) for the sample is selected at random 
from the first k units of the numbered population and then every k th 
unit thereafter. The sampling is based on the assumption that ihe 
complete list of the population members is available. Thus in a 
systematic sampling of n oui of N y the population units are serially 
numbered from 1 to N in home order and are such that N=rik % 
whele/z, k are both integers. Here we first draw a random number 
less than k, say /, from the fi st k units and the i every kth subse- 
quent unit in the population. Tnus the systematic sample contains 

the n units : /th, (i |-A)th, (i |-2fc)th, li+(n -1 A:]th at regular 

spacing*;. For example, the selection of every kth block from a 
list of blocks, or the selectio n of every kth time interval for obser- 
ving the no. of telephone calls on a public telephone-boo’h etc. 
can give us a systematic sample provided the f*r*t unit, less than 
k, is chosen with the help of the random no. tables. 

A systematic sample is not tritely random because of the fact 
that only the 1st unit for the sample is selected randomly which 
determines the whole sample, since the remaining units are 
automatically determined by a constant interval. But it may be 
equivalent to a simple random sample if the numbering of units 
in the population is effectively random. The alphabetical list of the 
names is a random list. It may also be equivalent to a s ratified 
random sample provided the units in each stratum are randomly 
listed. Because in stratified sampling the unit to be chosen from 
each stratum is based on random selection while in a systematic 
sampling its position relative to the unit in the first stratum is 
readily determined. A systematic sample also resembles a cluster 

sample being equivalent to a sample of one cluster chosen out of 
the k clusters of n units each. 

The systematic sampling has the following advantages — 

(i) Drawing a sample is easier, less time consuming and often 
without mistake. 

(ii/ The systematic sampling is spread more evenly over the 

whole population and is thus sometimes more precise than stratified 
random sampling also. 
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Below we give some situations is which the method can 
successfully be employed— 

(i) When there is no periodicity in the list of the sampling units. 

(ii) When the kth units which constitute the sample are not 
alike or correlated. 

(iii) When we want simplicity and low expenditure on survey. 

(iv) When tne complete list of population members is available. 

Etp. (3) Obtain a svs'ematic sample of 20 from a list com- 
prising the population of 1000 individuals. 

Sol : First we number the population members from 1 to 1000 
in some order. Then taking N=nk i e. 1000— 20 k, we get k= 50. 
Thus we have to sehct the 1st unit for the sample between 1 and 
60 from the random number tables. Let this no. selected at random 
be the 36 th unit of the population. Thus the subsequent units in 
the sample are: (35+50) th, (35+2x50)tb,-,(36+19x50)th, i.e. 

85 th, 136 th, .985 th. The sample, therefore, is constituted by 

the 20 units viz. 35 <h, 85 th, , 985 th. 

EXERCISE Vl'l 

1. Select a random sample of size 10 from a population of 
245 fields. 

2- Explain ihe terms : sample, population, random ’election. 

(M. Sc. Ag, Agra, 1965) 

3. Explain in brief the -following- 

random sampling, stratified sampling and systematic sampling. 

4. Compare and contrast the merits and drawbacks of sample 
and census studies. 

5. What is meant by sample-methods of enquiry ? When it is 
adopted and wbat are its advantage; ? 

6. Describe the special features of the different types < »f 
universes from which samples can be drawn. 

7. Suggest a method of obtaining a random sample of words 
from the English language by the use of random samplin > numbers 
and a dictionary. 

8. Comment whether the following samples are representative. 

(a) A mixture of sand and saw dust is sampled by taking 
a small quantity from the bottom. 

(b) A basket of grapes is sampled by taking a handful from 
the top. 

(c) Investigators into the size of the families in a town 
conducted a house-to-house inquiry iu the after noo.n, ignoring 
those houses at which there is no reply. 

9. Obtain a'sys ematic sample of 10 from the list of voters 
comprising a population of 500 votets. 



Part II 

The Experimental Designs 




Chapter I 


Design of Experiments 


Meaning & Definition : — 

Men have always been learning by experience from 
the experimental observations. The observations obtained 
form a carefully planned and well designed experiment in 
advance give entirely valid inference. In fact, the inductive 
inference is the only method by which the new knowledge 
comes to this world. Any inference drawn from a sample 
regarding its parent population is always attended by some 
degree of uncertainty which may be defined by the method 
of Mathematical probability. With this assumption, we define 
the Design of Experiment “as that logical construction of 
the experiment in which the degree pf uncertainty with 
which the inference is drawn may he well defined .” 

The principles of the design of experiments have so far 
been most explicitly developed in the field experimentation. 
In the field experimentation, we compare the different varieties 
of a crop, the different fertilizers, the different methods of 
seed-treatment and sometimes the different pieces of land 
itself. Thus, it is common to test the yield performance 
of a number of new varieties in comparison to a standard 
variety and also to examine the response of a crop to graded 
application of one or more fertilizer treatments. On the 
other hand , we may be interested in knowing the effect of 
the different cultivation -processes. These objects of comparison 
are called treatments. 
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Suppose, we have a large homogeneous field divided 
into different plots and we apply different treatments to these 
plots. Then the yield of these different plots are recorded. If 
some of the treatments produce bigger effects than the others, 
it remains to the experimenter to decide whether the observed 
differences ate due to the treatment effects or due to chance 
( uncontrolled ) factor. Our past experiences tell us that the 
yields of the plots will vary even under the same treatment. 
This variation from plot to plot is due to the uncontrolled 
( chance, random ) factors and is called the experimental — 
error. In order to test the significance of the difference 
between the two treatments, first we require an estimate of 
the experimental error and then apply the approximate rest of 
significance. The basic -requirement for the former is to repeat 
the treatments to a number of times and for the later the 
random allocation of the treatments to various plots. We 
always desire a lower magnitude of the experinamtal error since 
the lower error dcte.ts the smaller real differences, i, c. it 
increases the precision of the design. It can be achieved partly 
by replications ( 'repea titions of treatments ) and mainly by 
adopting the technique of local-control. The local-con tr< 1 is 
the technique of dividing the whole experimental field which 
may be expected to be heterogeneous with regard to soil 
fertility in to homogeneous blocks row wise, Colutnm wise or 
both according to the fertility gradient present in the soil of 
the field Thus the basic principles of the design of experiments 
are — 

(1) Replication, 

(2) Randomization and 

(3) Local control. 

_ '(i) Replication : — The repeatition of the treatment 
under compariaon is called the replication. The purpose of 
replication is two fold — 

(i) It reduces the experimental error. Since we know 


that the sampling variance of a mean yield is \/ r * Where 
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'a is the S. D. of the individual observations ( i. e. the 
ploc vield and *r’ is the number of replications. Therefore, 
the replication has an important but lim'ted roll in increasing 
the precision of the design. 

ii ) The main purpose of replication is to supply an 
e stimat e of the e xpe rimental error of which there is no other 
alternative and without which the significan c e of t he d iffere nce 
between the two treatme nts can no t be judge d. 

(2) Randomization : — The allocation of treatments 
to various plots in a random manner is called the 
randomization. The purpose of randomization is: — 

(i) To guaranttce the validity of the test of significance 
as the A. V. test (which is used to test the homogeneity of 
the data i. c. to test whether the different treatments are 
equally effective) is based on the assumption of the randomness 
of the observation. 

(ii) To ensure that the different treatments on the 
average are subject to the same environmental effects. Therefore 
the difference between any two treatments remains free from 
bias. 

Thus, when the treatments are replicated a number of 
times and alloted randomly to various plots in the field, we 
are in a position to test the significance of the observed 
treatment differences by the help of statistical tests. 

(3) Local Control: — When the experimental area is 
h eterogeneou s and the treatments are scattered randomly over 
the whole area, the soil heterogeneity will also enter the 
chance factor and thus increases the experimental error. It js 
desirable to reduce the experimental error as far as practicable, 
since . the lower experimental error can detect a smaller real 

difference between the treatments. In order to remove the soil 
fertility effect from the experimental error} the whole 
experimental area is divided into homogeneous groups (blocks) 
row wise, column wise or both according to futility gradient 
in such a way that th* variation between the blocks is maximum 
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and within the blocks is minimum. The randomization is kept 
restricted over the blocks. This process of reducing the 
experimental error by dividing the experimental area into more 
homogeneous blocks is known as local control. The introduction 
of local control ensures that the comparison between the 
treatments is made under as similar conditions as is possible 
with resources at hand and this helps in the reduction of the 
experimental error. Various forms of arranging the plots into 
homogeneous blocks have so far been evolved which aj;e called 
experimental design. 

Uniformity Trial: —As mentioned above that for 
reducing the experimental error we have to divide the whole 
experimental area into more homogeneous blocks, for that we 
must have a correct idea of the fertility variation in the field. 
This idea may be obtained from the result of uniformity trial. 
A uniformity trial consists of growing the same crop with 
the same treatment all over the field. The whole field is 
divided into several small plots of equal size and the yield from 
each of these un ts (plots) are recorded separately. From such 
records, we can prepare a fertility contour map which gives 
a good idea of the nature of the soil fertility variation. The 
fertility contour map is prepared by joining the points of equal 
fertility through lines. 

Fertility Contour map 

An eye inspection of the fertility contour map shows 
that the fertility does not increase or decrease in any systematic 
pattern but its distribution over the whole field is random. It 
is also observed that adjacent units are more or less similar in 
fertility than those apart. A homogeneous block can be formed 
by combining a number of adjacent units. The number of 
such units, which will form a block, can be determined by 
calculating the coefficients of variation for several cambinations 
of several units and choosing that combination for which the 
coefficient of variation ( c. v. ) is minimum. The variation in 
the plot-yields under uniformity trials is due to the 
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Fertility contour map 



Different shades showing different fertility 

uncontrolled factors called experimental error . Thus a 
uniformity trial gives an estimate of the - experimental error 

In short, an uniformity trial gives — ** 

(i) an estimate ot the nature and extent of the fertility 
variation, 

(ii) an estimate of the experimental-error, and 

(hi) a clue to reduce the experimental error by forming 
the homogeneous blocks. 6 

Precision: The precision of the design is the ability 
with which it detects the smaller real differences between the 
treatments. The precision of a design is more than the other 
if the least significant difference (critical difference) between 
the treatments at a given level is lower in it than that of the 
other. In other words the prob. of observing a difference less 
than or equal to a given value measures the precision of the 
design. Thus the degree of uncertainty with which we draw 
our conclusion is called the precision of the experiment To 
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find <norc suitable design for a particular problem means to 
achieve the maximum precision with the given cost and 
resources. The precision of a design can be increased by 
decreasing the experimental error (random error). The lesser 
the experimental error, the greater is the precision. Thus the 
precision can be increased by increasing the no. of rep'ications 
and using the technique of local control. 

Accuracy: In any experiment, the plot-yields are also 
affected by a no. of other factors than the treatments such as 
cultural operations (ploughing, hoeing, veeding, earthing etc), 
manurial dozes, cultivation processes etc. If the effects of these 
factors are not the same on various plots, then the treatment 
differences will be subject to a constant bias (systematic error) 
which cannot be diminished by increasing the no. of replications 
like-thc experimental error. In order to diminish this bias, 
the experimental technique should be so refined that all the 
plots arc equally affected by the above factors. The lesser the 
amount of bias the greater is the accuracy of the design. Thus 
the accuracy o£*a design is a measure for the lack of bias. 

Experimental Meterial: The material, on which the 
experiment is performed, is called the experimental material 
e. g. agricultural field, herd of cows, patients in a hospital 
and plants in a green house etc. 

Experimental unit: The whole experimental material 
is divided into a no. of small parts to which the treatments 
are applied. These small parts arc called the experimental units 
e. g. plot of a field, cow in a herd and plant in a green house 
are the experimental-units. 
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Following are the main consideration in 
planning an experimental design— 

(i) Object : ( Formulation of hypothesis to be tested) : 
Every experiment has a definite object to achieve and this 
object is indirectly defined by the null hypothesis (Ho). The 
null Hypothesis ( or object ) must be clearly and exactly stated 
without any ambiguity. For example, if we want to compare 
the effects of a number of manurial treatments on the yield of 
a certain crop, then we must decide whether the yield is a 
grain yield, straw yield or total produce (grain yield -f- straw 
yield). In addition to make a decision regarding the yield, we 
must also decide the variety, irrigation conditions, nature of 
the soil and cultural treatments etc. to be used in the 
experiment. In absence of the knowledge of all such relevant 
details, neither valid conclusions can be drawn from the 
experimental-data nor the scope of the experiment can be 
decided. Therefore, a clearly defined object is an essential part 
in the planning of an experiment. 

( ii ) Scope : The conditions under which the 
experimental results are valid, decide the** scope of the 
experiment. For example, if the results of a design arc valid 
for a particular variety of a crop and soil, while that of other 
are valid for any variety and soil then the former is of limited 
scope than the later. The scope of an experiment can be widen 
by testing a number of factors and their levels simultaneously. 
It is desirable to have a sufficient scope of the experiment as far 
as the experimental material and the cost permit. 

(iii) Feeler experiment : Suppose, the object of the 
design is to compare a foreign imported variety of a certain 
crop against a local variety. Then, before starting the actual 
experiment it is necessary first to know whether the imported 
variety will germinate and prove itself a successful variety 
under the changed climatic and soil conditions. This can be 
known by showing the new variety in some plots, called the 
observation plots. Such an experiment wh ch is carried out to 
test the suitability of some treatments is called feeler 
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experiment It is recommended for the situations where the 
suitability of some treatments is doubtful otherwise it will be 
a waste of resources and time. 

(iv) Experimental 6ite:— 

The experimental site should be as homogenous as 
possible. The idea of uniformity can be had by having a glance 
of standing crop, surface or better from the uniformity trial 
data. The uniformity trial is recommended for the newly 
acquired lands for which we do not have a pre-idea of 
fertility variation otherwise it will delay the experiment. In the 
field experimentation, it is very difficult to have a uniform 
experimental sitt, the fertility gradient will be present in one 
or more directions. Another care in selecting the experimental 
site is that there should be no tree on its border as the shade 
of the tree affects the yields of the border-plots. In the case, 
there is a tree on the border, the area which is expected to be 
affected by the tree should be excluded from the experimental 
area. 

(v) Choice of the experimental design:— 

The choice of the experimental design depends upon the 
heterogeneity of the experimental site, no. of treatments and 
the relative precision with which the treatments are to be 
compared. If— 

(1) the experimental site and environment are uniform, 
then a C. R. D. is used. This design compares all the 
treatments with equal precision. 

(2) the experimental site is not uniform but can be 
grouped (according to a. single criterion of classification) in to 
homogeneous groups ( blocks ) of land then R. C. B. D or R. 
B. D. is used provided the no . of treatments is not large 
otherwise Incomplete Block disign will be used. 

(3) the experimental site is not uniform but can be 
grouped according to a double criteria of classification in to 
homogeneous groups of land, then L. S. P. is used provided 
the no , of treatments ranges from 6 to 12. In the field 
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experimentation, this situation arises when the fertility gradient 
is in two directions at right angles. 

(4) some of the treatments are to be compared with 
relatively higher precision than others, then the Confounding 
Scheme is used provided the precision of the higher order 
interaction is to be sacrificed. 

(5) some of the treatments require larger plots and are 
to be studied with relatively lower precision than the others, 
the S. P. D. is used. 

Another consideration for the choice of the experimental 
design is the availability of the resources. For example, the 
lack of proper training and Skill prevent the use of complex 
design even if they arc of higher precision. 

(vi) Replication: An adequate no. of replications for a 
no. of treatments cannot be suggested in advance of the 
experiment as it rqeuires the knowledge of the fertility-variation 
in the experimental site which is rarely known. In absence of 
this knowledge, the rule for the no. of replications is to take 
such a no. of replications that provides at least 12 d. f. for 
error. This rule is based on the fact that the values of F do not 
decrease rapidly beyond v 2 =J2. 

( vii ) Ran domization: For the validity of the 
experimental-results, it is necessary that the different treatments 
should be randomly allocated to different experimental units 
(plots). The procedures of randomization are different for 
different experimental designs. 

(viii) Refinement of experimental technique' In 
, order to achieve the real treatment differnces, it is essential 
that all the experimental plots should be subjected to the same 
type of cultural operations other than those under investigation. 

, (ix) Ancillary observations: When the plot-yields 
r arc suspected to be affected by a character which is uncontrolled 
and varies randomly from plot to polt, it should.be measured 
.irt additipn.to the yield of each pl,ot. Tjie measurements made 
L th$ .yflcoutrolled character arp called ancillaryobservations. 



10 


The experimental Designs 

These observations are used to eliminate the variation due tJr 
the uncontrolled character from the experimental error and 
thus to increase the precision of the experiment. 

(x) Shape of blocks and plots: When the experimenter 
has a choice for the experimental area, he should choose that 
one which seems to be uniform. An idea of the uniformity can 
be obtained from the appearance of the surface and the 
previous crop. After the selection of the experimental area, he 
has to investigate the fertility gradient of the area. Suppose, 
the fertility-gradient is in the direction from A to B. 
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Now he has to divide the whole experimental area into 
different homogeneous blocks such that the variation between 
them is maximum. This can be achieved by making the blocks 
as compact ( square ) 3s possible. These compact blocks are 
arranged one after one along the fertility-gradient. Further, each 
block will be divided into as many plots as the no. of treatments. 
In dividing the blocks into plots, the object is just tippbsite 
to the above i. e. the variation within the blocks sfrbuld 
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be minimum. This object can be attained when the plots are .as 
4like as possible to each other. Thus the shape of the plots 
should be rectangular with its larger side parallel to the 
direction of the fertility-gradient and all of them should be in 
a row across the block as shown in the above figure. When the 
no. of treatments is large, the plots have to be arranged in two 
or more rows in order to maintain the compactness of the 
blocks. An exception of this, is the case of the sloping 
experimental site. In this case, the plots arc arranged in a 
single row with their longer sides parallel to the slope as the 
fertility-gradient is in the direction of the slope. This 
arrangement sacrifices the compactness of the blocks in the case 
of larger no. of treatments. 

(xi) Size of the plot : The size of the plot depends 
upon the experimental area available, no, of treatments and 
their replications and the crop. The optimum sizes of the plots 
for different crops are given in the following table. — 


S.N. 

Name of 
the crop 

Plot-size 
in acre 

• 

1. 

Cereals 

1/10 

2. 

Maize 

1/20 

3. 

Sugar-cane 

1/40 to 1/20 

4. 

Vegetable 

1/80 
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{ xii ) Border effect : The borders of a plot are also 
affected by the treatment given to the neighbouring plots 
while the central plants remain unaffected. In order to eliminate 
this effect, a non experimental border should be left around 
each plot which is given the same treatment and cultivated in 
the same way ( cultivation practices ) as the plot, but its yield 
is harvested separately and is not taken into consideration for 
the purpose of the experiment. 

(xiii) Statistical Analysis: The consideration discussed 
so far, enable us to draw the valid conclusions of high precision 
while the statistical analysis provided a way how the 
conclusions can be drawn from the recorded experimental data. 
The statistical analysis comprises of— 

(i) analysis of variance, 

(ii) computation of S.E. & C.D., 

(iii) a sketch of the tabular form for presenting the results 
and 

(iv) an account of the test of significance to be applied 
to the proposed design 

(xiv) Report : Finally, the conclusions drawn from the 
statistical analysis and comments on them ( if any ) should be 
summarized in the form of a report. 
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EXERCISE NO. I 

Q. No. I : — Explain the terms given below, mentioning their 
rolls in the field experimentation: — 

(a) Local control, 

(b) Replication, (M. Sc. Ag. Agra, 1964) 

(c) Randomization and 

(d) Uniformity trials, (M. Sc. Ag .Agra 1958) 

Q. No. 2 : Disscuss the important practical considerations in 
carrying out field-experiments at the research forms? 

(M. Sc. Ag. Agra, 1963) 
Q. No. 3 : Define the following terms with a suitable example 
for each. 

(a) Experimental -unit, 

(b) Treatment, 

(c) Precision & accuracy and 

(d) Random (experimental) error. 

Q. No 4 : Describe the important methods for increasing the 
accuracy of field experiments ? 

(M. Sc. Ag. Agra, 1965) 
Hint.'-Replication, local control, ancillary observation, and 
refinement of experimental technique are the important 
methods. 

Q. No. 5 : What are uniformity trials ? How do they he p in 
determining the optimum size and shape of' the experimental 
plots ? 


(M. Sc. Ag. Agra, 1959) 




Chapter II 


Completely Randomized Design 
(C. R. D.) 


Description: — 

On the assumption that the whole agricultural field 
under experiment is homogeneous, we lead to the simplest type 
of design called the C. R. D. Though this assumption of 
homogeneity is too big and particularly it is never satisfied 
in agricultural field experimentation at least, still in many 
laboratory experiments e g. in Physics, Chemistry and cookery 
where a quantity of material after thorough mixing, is divided 
into small samples (units) to which the treatments are applied # 1 
this procedure is the best and the first. 

In this stage of layout, each treatment is alloted to 
different units entirely by chance (randomly). Particularly, if a 
treatment is to be applied to four units, the randomization 
gives every group of four units an equal chance of receiving 
the treatment. 

Advantages:— 

( i ) Alt the experimental-material can be utilized and 
any number of treatments with different replications can be 
used. 

( ii ) The statistical analysis is easy even if the no. of 
replications are different for different treatments or if the 
experimental error differs from treatment to treatment. 

(iii) The analysis remains simple even in the case if one 
WjJ or more mills arc missing ‘or rejected. Moreover, the relative 
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loss of information^—^ due to missing data is smaller in 

comparison to any other design. 

(iv) It provides the max. no. of d.f. to estimate the error 
variance. 

Disadvantages: — 

( i ) The main demerit lies in the assumption of 
homogeneity. Suppose the whole experimental material is not 
homogeneous, then the whole variation among the units enters 
the experimental error, thus increasing the error variation and 
consequently making the design inefficient. Though for a given 
no. of treatments and experimental units, this design provides 
a max. no. of d.f. for the estimation of error and thus 
increases the sensitiveness of the experiment. 

(ii) Due to the assumption of homogeneous and scarce 
material, this design is seldom used in field experiments and is 
replaced by a better substitute " Randomized Complete Block 
Design” (R.C.B.D.). # 

Applications: — 

The C. R. D. is appropriate under the following 
situations— 

(i) When the experimental material is homogeneous 
and limited as in laboratory experiments. 

(ii) Where an appreciable fraction of units is likely to 
be destroyed or to fail to respond. 

(iii) In small experiments, where the increased accuracy 
from, an alternative design does not compensate the loss of 
error d.f. 

^'Randomization:— 

. Suppose, we have got three treatments A, B, & C eath 
replicated four times* so we divide our experimental-area into 
12 equal sized-plots. Let these plots be numbered in a convenient 
way from 1 to 12. Then we consult a random number table 
and write down . serially the numbers in order they occur 
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in the table neglecting zero, repeated numbers and those greater 
than 12 Let these. numbers be — 

1,3.6, 12,4,7, ), 10,2,5, 3, 11. N>w th: fint 
treatment A will be applied to the plots bearing the numbers 
1, 3. 6 & 12, treatment B to the plots bearing the numbers 
4, 7, 9, & 10 and treatment C to the reamaining plots bearing 
the numbers 2, 5, 8 & 1 1 . 

Formulation of the hypothesis to be tested: 

In order to test the significance of the difference between 
the treatment-means, let us set up the hypothesis — 

Ho: that there is no significant difference between the 
treatment-means. 

Then, we can calculate the prob. for the observed differ- 
ence assuming the null hyp. (Ho) to be true. If we are unable 
to calculate this prob,, we can not draw the definite conclusion 
from the experiment. This can only be done, when the 
null hyp. (Ho) is clearly defined. Therefore, the setting up of 
the null hypothesis is as much an assential part of the design for 
the interpretations of the results as replication and randomization 
are for the sensitiveness and validity of the experiment. 

We must note that a null hyp.can be rejected and never 
be accepted. Every (xperiment is designed and performed in 
such a way as to give the max. chance for the rejection of the 
null hypothesis whenever it is wrong. As soon as the null 
hypothesjs is rejected, we arrive at the definite conclusion. But 
if it is not rejected we say that there is no evidence against the 
null hyp. on the basis of the observations made. 

Statistical Analysis: 

Suppose, we have got V treatments (t x , t a , t v ) 

replicated r r r 8 r v times respectively, the plot-yields 

can be arranged in the following tabular-form— 
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To break up the T.S.S. into two parts — 

(i) between the treatments (S.S. due to treatments) and 

(ii) within the treatments, (S.S. due to error ), we proceed 
in the following manner — 

(,)T.s.s.= ^ij-c.F. whc[cC , F . = G; 

-s (Say) andN=sr, , i=l,2,...t> 

1 

j— 1*2,. . fi 


(ii) S. S. due to treatments 

t. — s t a 

r — z A l r* tz 


Si ( Say ) . 

(iii ) S.S. due to crror= T.S.S. — S.S. due to treatments = S, 

■ . (say) 

The results are summarized in the following A.V.T. 

Source of M. S. Fat Fat 

variation D. F. S. S. cal. F 5% 1% 


Treatment v— 1 



'^=V T 

v i 


'S 1== v T 
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Note (1) • If the error variance (V E ) is gr a ter than the 
variance of the facte r of classification then F is calculated 
by keeping V E in the numerator. 

Note (2) : In the anaylasis of variance table, the calculated 
values of F significant at 5% level arc marked with one 
star(*) and those significant at 1% are marked with 
double stars ( # # ) and they are said to be highly 
significant. 

Inference: According to the null hypothesis our treatments 
are equally effective < identical in their yielding capacity) 
and so under the hypothesis the variations between the 
treatments & within the trertments; both are due to 
chance causes and so they must not be significantly 
different. 

under the above consideration, 

if F cal. = V T comes out to be greater than Fa (vj, v a ), then 

V E 

the hypothesis is rejected. (where a is the desired level of 

significance ) 

(ii) If F cal. < Fa ( v^Vj ) t there is no evidence against the 
null hypothesis at a% ifevel of significance. 
There is one of the two out comes of the above test — 

(i) either the hypothesis is not rejected, 
or (ii) the hyp. is rejected. 

If the hyp. is not rejected.it means that overall there 
are no significant differences between the treatments and we ' 
do not require any further analysis. But if the hyp. is rejected, 
we conclude that the treatments hive thei: significant effect 
and further want to know which of them is more effective. For 
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this purpose we shall compare the treatment-means in pairs by 
Student's’ *t* test given by— 

_ [treatment mean-difference] 

S.E. of the difference 


where, S.E. of the difference 



*2 


Critical-difference: 


Instead of calculating Student’s *t* for different pairs of 
treatment-means, we can find the least significant difference at 
a given level of probability. This difference is known as the 
critical-difference ( C.D. ) and is given by the formula— 

( C.D.)a%= ( S.E. of the difference ) X *a% ( error d. f. ) 
where *a% ( error d. f ) stands for the tabulated 
value of t at a% level for the error d,f. 

If the difference between the two treatment-means 
comes out to be greater or equal to C. D. , they are 
significantly different otherwise they are insignificant 

While comparing the several treatment-means, we arrange 
them in the desending order of their magnitudes and then 
compare in pairs. The means which do not differ significantly 
are under lined by a bar . 

Report: 

The results obtained and the comments on them are 
summarized in the form of a report, in the end. 
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Exp. No(l): 

Three treatments A,B & C are compared in a completely 
randomized design ( C. R D. ) with six replications for each. 
The lay out and straw-yield in Kgm/plot are given in the 
following table — 


fc 

A 

: 4 17 

\ 

" (19 

A 

29 

C 

33 

B 

23 

B 

21 ; 

B 

A 

A 

C 

C 

B 

15 

25 

17 

35 

29 

23 

A 

C 

B 

C 

A 

C 

33 

25 

19 

37 

.23 

27 ; 


Where A denotes the control. 

B ” ” application of nitrogen 

22 Kgm / Hectre 

C ” ” application of nitrogen 

44 Kgm / Hect. 

Analyse the experimental - yields and state your 
conclusions ? 

Solution: 


For convenience of the statistical analysis, the data is 
arranged in the following tabular-form 
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The values in the brackets denote the squares of the respective yields G=450 „ 

of the plots. (11986) ^ 202600 
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TT OS The treatments A, B & C do not differ significantly 
in their yielding-capacity. 

C. F. = ( i^ Q ). 2 . = H250 
N 18 

T. S. S. = 22 y 2 ij-C. F. , 

1 j 

=(17) 2 +(29)*+... + (27)M1250 . 

= 11986-11250=736 

where yij denotes the yield cfj tb plot 
under i‘ b treatment, 

i 1,2 ?,' TV v 3 f AlsoN = i' rj 

y— 1,2 riixri- 6 t 

= 18 

S.S. due to treatments 

_T A 2 +T B ’+T e 2 _ rp 

6 

= 11250=11622-11250=372 

6 

S. S.uuc to error = T. S. S.— S. S. due to treatments 
=736—372=364 


The results are summarized in the following A.V.T. — 


source of 
variation 

D.F. 

S.S. 

M.S.S. 

i' 

Fcal. F tabulai 

5% 

ted at 

i% 







T reat. 

2 

372 

186.0 

7. 658 3.69 

6.37 

Error 

15 

364 

24.27 

1 




736 


24 


The Experimental Designs 


' The value of F cal. comes out to be highly significant, 
hence the three treatments differ significantly in their yielding 
capacity. 

Further to see which of the treatment is more effective, 
we compute the S. E. of the difference between the treatment 
means and the C. D. in the following manner— 

S. E. of difference -J =J 2x24-27 =^84 

(C. D. ). w = ( S. E. of difference ) X t(15) 

.«s 

- 2-84x2131 =5-9520 

Now we write the treatment means in the descending 
[ order of their magnitudes— 

' Treat. CAB 

. mean. 31 24 20. The treatment means which do not 

differ significantly are under lined by a bar. 

Report: 

The three treatments A, B & C differ significantly in 
their mean yields and the treatment C has yielded more than A 
or B. The differences between the mean of C and that of B or 
A are significant. A has also shown more average yield than 
that of B but their difference is not significant. Thus we conclude 
that the application of nitrogen has increased the straw-yield. 
This is due to the fact that in the presence of nitrogen 
vegetative growth takes place. 

Note: 

In the above example, if we take the deviations from 
y»25, the calculations will become much more easy. 
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Exercise I 

Q". No. (1): 

The data represent sugar-yield in tons /acre for five 
varieties of sugar beet. 

Variety plot-yields 

A: 1*33 135 135 1*39 1*38 1*40 

B : 1-31 1-38 1-36 137 — — 

C: 1-35 1*32 1*34 1*31 1*36 133 

D: 1*34 1*32 136 134 1*35 - 

E: 1*33 135 133 1*36 - — 

Analyse the data and test for a significant difference 
between the varieties ? 

Ans: F=1‘80 

Q“. No. (2); 

Three varieties A, B & C of a crop are tested in a comp- 
letely randomized design with three replications for each. Thi 
layout and the yield in pounds/plot are given below— 


A 

B 

C 

8 

20 

14 

B 

C 

A 

22 

23 

18 

A 

C 

B 

7 

11 

18 


Analyse the experimental -yield and state your 

conclusions ? 

Ans : F=2*5 
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Q”. No* (3 ) : 

Five plants were selected from each of half a 
varieties of pea and their pods were counted — 


dozen 


Varieties 

Vi 

V 2 

V 3 

V 4 

V 5 

v« 


Pods/plnat 


17 

23 

27 

25 

20 

6 

9 

7 

6 

4 

9 

12 

13 

11 

11 

14 

7 

17 

20 

17 

25 

23 

20 

32 

27 

67 

59 

53 

61 

72. 

of 

variance 

table 

to test 

the 


cance of difference between the average number of pods of the 


six varieties 


Ana : F=10T6 

Q”. No. (4): 

The following tabic gives the butter-fat percentage in 
cow's-milk for different four breeds— 


Breed 

A : 


Butter fat% 

4-5 40 45 60 
50 40 50 45 
4-5 4-5 5-5 5*0 
6 0 5-0 6-5 5-5 


Analyse the data and state your conclusions ? 

Q“. No: (5): Ans: F=3.84 

In a varietal trial involving six varieties of pea each with 
4 replicates, the yields in lbs/plot are given below. Analyse the 
data and arrange the varieties according to their performance 
assuming the data to be homogeneous with respect to the 
replications ? 


Variety 

Yield in 

lbs/plot 


Vi : 

ITS, 

17-3. 

28-5, 

18-5 

V, : 

20*6, 

18-8, 

29*5, 

210 

V, : 

177, 

127, 

26-8, 

24-9 

V 4 : 

6-2, 

5*0, 

9*6, 

4*1 

V, : 

6-2, 

TO, 

5*4, 

7*7 

V, : 

14*9, 

12-5, 

lfr3, 

12*6 


Ans : F-12-2 
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Q n . No. (6) 

[ a ] What is a completely randomized design ? Give its 
applications and advantages ? 

[ b ] Following table gives the life-periods in weeks for 
4 batches of radio-valves. Test the significance of difference 
between the life-periods of the four batches? — 

Batch life in weeks 

A: 130 138 134 142 150 

B: 138 134 146 150 142 

C: 126 138 142 130 146 

D: 118 106 122 114 118 

Ans: F= 13*96 




CHAPTER III 


Randomized Block Design (R.B.D.) 


Description : — If the whole experimental area is not homo- 
geneous and the fertility gradient is in one direction only, then it is 
possible to divide the whole area into the homogenous blocks per- 
pendicular to the direction of fertility gradient. Each of the blocks 
constitutes a single replication. If the treatments are randomized 
within each block separately, the result is a randomized block-design. 
This design controls one source of variation in the experimental 
material. Since all the treatments will be applied in each block, 
therefore the blocks will be divided into as many plots as the number 
of treatments. 

As the experiment il error has to be estimated from 
variations within blocks, it is essential that the blocks should 
be as homogeneous as possible. We know that the total variability 
of the whole experimental-material can be divided into two parts : — 

(i) Between blocks and 

(ii) Within blocks (error). If we reduce one, the second will 
automatically increase since the total variation over the whole 
material is constant. Thus, we arrive at the conclusion that the 
variation between the blocks should be maximum and that of within 
blocks minimum. At this stage, we must also note apart from the 
treatments to be compared, the whole block (all the plots of a 
block) should have uniform agricultural treatment (weeding, hoeing, 
earthing, harvesting etc.). For example, if the hoeing is to be spread 
over a number of days then it must be done for all the plots of a 
block on the same day. 

Applications In field experiments, this design is used when 
the fertility gradient is in one direction only. In fact the design 
can be used where it is desired to control one source of variation in 
the experimental material. For example, in comparing the effects 
of different diets on the milk-yield of cows when they are of K 
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different breeds or lactation periods, in comparing the effects of 
different drugs in controlling a certain disease when the patients are 
of different age-groups, in comparing the efficiency of a number of 
salesmen when they are sent to different types of sale-areas. 

Merits & Demerits : — Following are the chief advantages of 
a R.B.D. 

(i) Sensitiveness — This design remove the variation between 
the blocks from that of within blocks which generally results in a 
decrease of experimental-error and thus increases the sensitiveness. 
‘Cochran’ has shown that the experimental-error of R.B.D. is 60% 
that of C.R.D. 

(ii) Flexibility : — This design allow any numbers of treatments 
and replications. The only restriction is that the number of 
replications are propoitional to the nunr er of blocks. Generally, 
the no. of replications is equal to the no. of blocks and if any how 
extra replication is desired to some treatments, they may be applied 
to 2 or more units within each block. 

Although any no. of replications can be used with any 
no. of treatments but it is desired that the no. of treatments 
should be such that they provide at least 12 d. f. for error. If V 
is the no. of* treatments, then the min. no. of replications 

12 

ensuring at least 12 d. f. for error is r= 1 + -_-j 

(iii) Ease of analysis : — The statistical analysis is simple even 
in the case of missing values. 

(iv) Unbiased comparisons can still be made when the experi- 
mental-error-variance is different for different treatments. 

No design is more popular than R.B.D. due to its sensitive- 
ness, flexibility and ease of analysis. 

The demerits of this design lies in the fact that it cannot control 
variations in the experimental-material from two sources and in such a 
case it is not an efficient design. Further, if the no. of treatments is 
large then the size of the block will increase which usually results in 
introducing heterogeneity within block and" thus increasing the 
experi mental- error. 

The above considerations lead thit R.B.D. is not a suitable 
design when the no. of treatments is large and the variations in the 
experimental material are from two sources. 

Shape of Blocks and Plots:— When the experimenter has a 
choice for the experimental area, he should choose that one which 
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seems to be uniform. An idea of the uniformity can be obtained 
from the appearance of the surface and the previous crop. After the 
selection of the experimental area, he has to investigate the fertility- 
gradient df the area. Suppose, the fertility-gradient is in the 
direction, 
from A to B 

A >B 


f G 
c-2 

Block ( 

Block II 

Block III 

O o 

'3 £ 




j 

'C'-o 

• 




>12 1 





2.2 1 

1 ! 


1 . .... 


I 

Now, he has to divide the whole experimental area into different 
homogeneous blocks such that the variation between them is 
maximum. This can be achieved by making the blocks as compact 
(square) as possible. The.se compact blocks are arranged one after 
one along the fertility-gradient. Further, each block will be divided 
into as many plots as the no. of treatments. In dividing the blocks 
into plots, the object is just opposite to the above i. e. the variation 
within the blocks should be minimum. This object can be attained 
when the plots are as alike as possible to each other. Thus, the 
shape of the plots should be rectangular with its longer side parallel 
to the direction of the fertility-gradient and all of them should be in 
a row across the block as shown in the above figure. When the no. 
of treatments is large, the plots have to be arranged into two or more 
rows in order to maintain the compactness of the blocks. An excep- 
tion of this is the case of the sloping experimental sight. In this 
case, the plots are arranged in a single row with their longer sides 
parallel to the slope as the fertility-gradiant is in the direction of the 
slope. This arrangement sacrifices the compactness of the blocks in 
the case of large no of treatments. 

Layout and Analysis 

Randomization:— Suppose,’ we have got five varieties A, B, C. 
D& Eand want to replicate six times each. Then the whole field must 
be divided into six homogeneous blocks each having five plots of 
equal size. The randomization is done by consulting a Random 
Number Table and selecting the one digit numbers in order they 
occur in the table leaving 0, repeated numbers and numbers >3. 
Let these be— 
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I, 5, 2, 4, and 3. Now, we may take up first block, the treatment 
A will be given to 1st plot of this block, treatment B to 5th, C to 2nd) 
D to 4th and E to 3rd plot of the block. Thus the first block has 
received 5 treaments in a random manner. In each block, fresh 
randomization is made. Thus, six independent randomizations are 
needed for six blocks. 

Formulation of Null Hypothesis:— In order to test the 
significance between the treatment means, we set up the null 
hypothesis that there is no significant difference between the treatment 
means and blocks. 

Statistical Analysis: — Suppose, we have got V treatment each 
With *r’ replications and if yn denotes the yield of the plot which 
is in the ,th block and to whom ( th treatment is applied. The yield 
from the original layout can be arranged in the following tabular 
from — 


Treat/Block 

i 

2 

j 


Totals 

1. 

y ix 

J'i a 

J Vn— 

•• J.r 

Tr 

2. 

y* t 

y^ 

y >>■■■■ 

y>r 

T , 

i. 

yt i 

• 

y,* 

y.i 

■■■■y,r 

Ti 

V 

y v i 

JV, 

y»i 

....y„ 

T v 

Total 

Br 


B, 

■B, 

G 


If the original yields are large numbers, then their deviations 
from some arbitrary origin can be used in the above table. 

Here, the sources of variation are— 

(i) treatments, 

(ii) brocks and 

(iii) Error. 

The sum of squares are obtained as follows — 

G t 

(i) T. S. S.—'ZiEj F. , where C. F.= - and N-~ v r 

—S (say) 

T.t 

(ii) S. S. due to treat.=£<— * C. F.,=S 1 (say) 

(iii) S. S. due to blocks=£,— ^ C. F.,=S t (say) 

(iv) S. S. due to error= T. S. S—S. S. due to (treat. -f blocks) 

=S 4 (say) 
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Now, we arrive at the following A. V. T.— 


Source of 
variation 

D.F. 

s.s. 

M. S. S. 

F calculated 

F tabulated at 






5% 

1 1% 

Treatment 

v — 1 

Si 

Si — y 

7r v t 

Vt 

y L if V T > V E 
E 

... 

... 

Block 

r— 1 

=--Vi 

S, 

Z- Leo 

1! 

to* 

V R 

y E ' <V S > *1 

... 

... 

Error 

i 

(v -1) 

r -1) 

S, 

1 

•$»_ y 

V 3 ~ V E 

— 



Totals j 

vr--l 

s i 

— 

— 

— 

— 


If the calculated F coiresponding to treatment comes out to be 
significant at level, then we require farther analysis ai d compute 
the S. E. and C D. as follows — 


S E. of the difference between two treament means. 

(C. D.) ~(S. E. of difference) x /(error d. f.) 

«% «% 

Inference: — The significant value of F leads to the conclusion 
that its corresponding factor has a significant effect on the yield and 
the treatment means differ significantly, they are arranged in the 
descending order of their magnitudes. Those pairs which do not 
differ significantly are underlined by a bar. Finally, the conclusions 
obtained from the analysis of the experimental data and comments 
on them are summarised in the form of a report. 

Exp. (1) : — Three varieties A, B and C were tested in a R. B. D. 
each with six leplications. The layout and yields in lbs/plot are 
given in the diagram appended. Analyse the experimental yields 
and state your conclusions ?— 
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Solution: — 

Ho : Three varieties A, B and C and the blocks do not differ 
signicantly in their yielding capacity. 

For convenience of calculations, we take the deviations from 
y=251bs and arrange the data in the following table — 
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c F=.?--'°>’ -.0 

18 

TS.S-lzy\'-C F.= 736-0-736 

5. S. due ,0 treats 2- -C. F _*+”»+■«* -0. 1™ 
j r 6 6 

= 372 0 

5. 5. due to blocks=£— C. F. 

j v 

36 | 144+0+324+324+36 n 864 

" ‘ 3 ' U ^T' 


S. S. due to error— T. S. S -~S. S. due to (treat. + blocks) 
-736 (372-[-288)= 76 0 
Now, we arrive at the following A. V. T. 


Source of 
variation 

! D. F. 

s. s. 

M. S. S. ^ 

F call 

! r tab. at 

1 

i 5 % | i% 

Treat. 

2 

372-0 

1 

i860 

1 i 

24-48**1 

. 410 

7-56 

Blocks 

5 

2880 

57-6 

7-58** 

333 

5-64 

Error 

10 

760 

76 

i 

— 


— 

Totals 

~*~i 

VC 

rn 

- I - J 


— 


The value of F corresponding to treatment comes out to be 
highly significant, hence the treatment means differ significantly in 
their yielding capacity. 

Now, in order to investigate which of the treatment pairs differ 
significantly, we compute the S. E. of the the difference between the 
treatment- means and the C. D. as follows — 

S. E. of the difference= y ' ^ E _a/~ 

— y/ 2 T 533= 1'58 

(C. D.) = (5. E. of difference) xf( 10) 

5% *05 

= l , 58x2’228=3'54 approx. 
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Now, we arrange the treatment means in the decreasing order 
of their yields — 

Treat : C A B 

Mean : 31 24 20 

Inference: — The three varieties A, B and C differ significantly 
at 1% level. The max. yield has been recorded in the case of 

variety C followed by variety A. The difference between C and A 
is significant at 5% level. A has also shown higher yield in comparison 
to B and the difference between them is also significant at 5%. Thus 
the variety C is the best of all as regards the average yield. 

Exp. No. (2) : An experiment was carried out on wheat with 
three treatments in four randomized blocks. The plan and yield 

per plot in seers are given below — 


Blocks 


I 


II III IV 



— 

A 

i C 

8 

! 10 

C 


B 

12 


8 

B 


A 

10 

i 

8 


l 



A 


B 

6 


10 

B 


A 

9 


5 

C 


C 

10 


9 





Analyse the data and state the conclusions ? 

(M. Sc. Ag. Agra, 1959) 


Solution- 

Three treatments A, B and C and the blocks do not differ 

significantly. 

* 

For convenience of calculations, we take the deviations from 
y=9 seers, and arrange the data in the following tabular form — 
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G a 9 

C. F.= _ = _=0-75 

T. 5. 5. =SSyy 2 -C.F.=41 -0-75=40-25 ' 

i j 

S. 5. due to treat. ^-C. f. 5 * * * -0-75 

* 4 4 


=26-75-075 


=2600 


5. 5. due to blocks =^: T 4-C. F. = 9 ±i± 4l + 9 -0-75 

J D j 

=7 67-0-75 

—692 

S’. 5. due to error = T. 5. 5.-5. 5. due to (treat + blocks) 
=40-25— (26 00 + 6-92) 

= 40-25-32-92 
=7-33 
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Now, we arrive at the following A. V. T . — 


Sources of 
Variation 

, 

D. F. 

s. s. 

Treatments 

2 

26-00 

Blocks 

3 

6 92 

Error 

6 

7-3 3 

Totals 

‘ .71 

40-25 


M S S 

1 

F Cal 

i 

F Tab at 

5% 

| 1% 

13-00 

10-6* 

514 

10-9 2 

23067 

1-8 

4-76 

9-78 

1-2222 

— 

j 

_ 1 

I 


The value of F corresponding to treatment comes out to be 
significant at 5% level and so the treatment means differ significantly 
in their yielding capacity. 

Now, to decide which of the treatment pairs differ significantly, 
we compute the \S\ E. of the difference between two treatment means 
and C. D. as given below — 

S. E. of the difference^ ----- J — 22 =\Z061 

=.-()• 78 

(C. D.)—(S. E. of the difference) x /(6) 

5% -05 

=0 78x2-447=1-91 approx. 

Now, we arrange the treatment means in the descending order 
of their magnitudes — 

Treatment : C B A 

mean : 10 25 9 25 675 

The treatment means which do not differ significantly from each 
other as regards their yields, are underlined by a bar. 

Conclusion — The three treatments A, B and C differ significan- 
tly in their yielding-capacity. The max. yield is due to the treatment 
C followed by B, but their difference is not significant. The treat- 
ments B and C both differ significantly from the treatment A. 
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Exp. No. (3) : The per plot yields and a part of the analysis 
of variance of variety trial conducted in randomized blocks are 
given below— 

Yields/plot 

Variety Blocks 



I 

ii 

III IV 

V 

A 

20 

19 

14 15 17 

B 

2 ? 

21 

19 19 j 

18 

C 

25 

21 

18 21 20 

D 

20 

.9 

17 13 21 


Analysis of Variance 


Source of Variation 1 

D. F. 

I s. s. 

! 

Blocks 



4 

72 

Varieties 


i 

| 

j 

— 

Error 





— 

Totals 


1 

19 | 

158 


Complete the analysis of variance ? What conclusions would 
you draw from the experiment ? (M. Sc. Ag. Agra, 1960) 

Solution— 

Ho : The four varieties A, B, C and D and the blocks do not 
differ significantly in their yielding capacity. 

For convenience we take the deviations from y= 19 and then 
arrange the data in the following tabular form — * 


Blocks 

\ 

\ 

Variety 

I 

11 

i 

hi 

I 

IV 

V 

Totals 

r 


! Mean 

A 

I 1 

1 (0 


(25) 

4 

(16) 

—2 

(4) 



17 

B 

06) 

2 

(«) 


0 

(«) J 

— 1 
(0 

5 

(21) 

25 


C 

6 

(36) 

2 

( 4 ) 

(1) 

2 

( 4 ) 

1 

(0 

10 

(46) 

100 

21 

D 

1 

(1) 

m 

—2 

4) 

-6 

(36) 

2 

( 4 ) , 

-5 

(«) 

2S 

| 

18 
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c F .?=C2!-o 

• N 20 


0=0 

ES/y=(158)-0 

O' 


„ „ . , . .. T t * _ _ 100+25 + 100+25 n 

5 . S. due to varieties = 1 .- 4 — C. F. = — — * — — 0 

i 5 5 

= 2 ^- -0=50 

T. S. S. = ZEpij—C. F.— 158 (given) 

O' 

S. S. due to error=7\ S. S.—S. S. due to (varieties + Blocks) 

= 158 — (50'0+/2 0)=158— 122 
= 36-0 

Now, the analysis of variance table can be completed in the 
following manner — 


Source of 
Variation 

D. F. 

1 

S. S. 

M S. S. 

F cal. 

F tab. at 

1 5% 

I 

»% 

Blocks 

1 

4 

720 

180 

60** 

3-26 

5- 1 

Varieties 

3 

' 50 0 

16-7 

5 - 56 * 

3 49 

5 95 

Error 

1 

12 

360 

3- 0 

— 

— 


Totals 

19 

158-0 

— 

— 

— 

— 


The calculated value of /’corresponding to varieties comes out 
to be significant at 5% level while that corresponding to blocks 
comes as highly significant. 

Thus, we conclude that the varieties A, B, C and D are signifi- 
cantly different in their yielding capacity at 5% level of significance. 

In order to investigate which of the variety pairs differ signifi- 
cantly, we compute further the S. E. of the difference between the 
two variety-means and the C. D. as shown below — 
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1 2V 1 2x3 

S. E. *f the difference = J _ . !L = J ~ 5 = V 1 '^= 1 094 

(C. O.) =(£.* £. of the difference) x / (12) 

5% ' -05 

= P094 X 2* 179= 2‘ 3838 
= 2-38 


Now, we arrange the variety-means in their decreasing order 
of magnitudes — 

Variety : C B D A 

Mein : 21 20 18 17 


The variety-mean •> which do not differ significantly from each 
other at 5% level, are underlined by a bar. 

Conclusion— The variety C produces the max. yield and follo- 
wed by variety B but their difference is not significant at 5% level. 
1 he variety C differs significantly from D and A bath. The variety 
B and D also D and A do not differ significantly a’ d the variety A 
has the minimum (least) yielding-capacity. Th: varieties and the 
blocks differ significantly at 5% and 1 % level of significance respec- 
tively. 


Exp- No. (4) A varietal trial was conducted in a randomized 
block design with 9 vatieties and 5 replications. In analysing the 
yield-data, the following sums of squares were obtained — 

Blocks : 388-06 
Varieties: 731 -75 
Error : 635-65 

(a) Construct the analysis of variance and calculate the critical 
di Terence ? 

(b) Arrange the varieties in order of performance. The variety 
means were — 

Variety :123456789 
Yield in 2140 18 50 24 90 14’00 

mds/acre : 20-80 21 40 19-80 15 40 1P30 

(M. Sc. Ag. Agra, 1963) 
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Solution— 

Ho : The varieties and the blocks do not differ significantly. 

(a) Under the above hypothesis, we prepare the following 
analysis of variance table — 


Sources of 
Variation 

D. F. 

s. s. 

M. S. S. 

F. cal. 

F. tab. at 

/ 5 % j ~ 1 %~ 

Blocks 

4 

388.06 

i 

97-015 

4-8** 

2-674 

3 982 

Varieties 

8 

73. -75 

91-46875 

4-6** 

2-252 

3134 

Error 

32 

b3 5-65 

19 88 

— 

1 

— 

Totals 

44 

1755-46 


1 

— 

— 


The calculated values of corresponding Fs indicate that the 
blocks as well as 'the varieties differ significantly in their yielding- 
capacity. 

Now. the C. D. will be computed in the following manner — 


\1V P " / 2'X 

S. E. of the difference = J — = *J ~ 


XI9 88 

5 -=V 7-952 


(C. D.) 
5% 

and (C. D.) 

1 % 


=282 

= (S. E. of aifference) x /(32) 

•05 

= 2-82x2-038=575 
= 2 82 x r(32)— 2*82 x 2*741 
•01 

=7 73 


(b) Variety: 6 2. 31 5 4789 

Mean : 24 90 21 40 21 40 20-80 19-80 18 50 15*40 14-00 11-30 


Inference -The above arrangement gives the idea that the 
variety no. six has the max. yielding capacity and the variety no. 
nine has the min. yielding capacity in the experiment performed. 
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EXERCISE II 

. Q. No. 1 . Plan a varietal trial to test five improved varieties of 
wheat inorder to selects suitable variety for your locality. Make use 
of R. B. D. with six replications. Construct the analysis of variance 
table and indicate how you would calculate the critical difference ? 

(M. Sc. Ag. Agra, 1963) 
Q. 2. (a) What considerations help you in determining the 
shape and arrangement of the blocks and plots in a field experiment 
using R B. D. ? 

(b) Give the relative merits and demerits of the R. B. D. over 
C. R. D. ? (M. Sc. Ag. Agra, 1965) 

Q. 3. (a) Randomize five treatments A, li, C, D and E 
to the plots of the following block ? 

Plot 


2 

3 

4 

5 

(b) What should be the minimum number of replications 
to compare the two treatments Vj & v 2 so that the d. f. for error 
be 12. * Ans. r=13 

Q. 4. The yield in the plot for three varieties of maize each 
with 6 replications are given in the following table. Prepare the 
analysis of variance table and test the homogeneity between the 
three varieties A, B & C and their replications ? 

Replications VARIETIES 



A 

B 

C 

1 . 

252 

213 

199 

2. 

46 

112 

60 

3. 

29 

133 

165 

4. 

48 

62 

21 

5. 

10 

27 

154 

6. 

38 

41 

116 
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Ans. 


Sl 


F- v- =1 ' 49 ' 

V E 


and F= 


= 528 


E 


Q 5. (a) Define a R. B. D. and write down the procedure of 
randomization in a field experiment by taking a suitable example ? 


(b) A district N.C.C. head quarter appointed five N.C.C. officers 
A, B, C, D & E each to five degree colleges in the district to impart 
a certain Military-training to the N.C.C. cadets in the institutions. 
The no. of cadets as trainees under the guidence of each officer in 
in the five colleges are recorded below. Test the homogeneity of 
the data with respect to the N.C.C. officers and the different colleges ? 


Officers 

I 

11 

Colleges 

111 IV 

V 

A 

50 

70 

115 

95 

100 

B 

55 

75 

120 

85 

105 

C 

45 

60 

100 

80 

95 

D 

40 

65 

85 

85 

100 

E 

60 

75 

95 

75 

110 


Ans. (b) F— 47.0 for colleges 
and F=2.99 for officers 

Q. 6. A farmer grouped into 4, his 24 cows of 6 breeds and 
fed them with 4 rations A, B, C & D for a fortnight. Then the 
increase in milk- yield in ounces/cow, were recorded as given in the 
following table. Analyse the data and test, whether there is any 
significant difference between the four rations and breeds at 5% 
level of significarce ? Given that F. 05 (3, 15) & F. 05 (5, 15) are 
3.29 & 5 05 respectively. 


Rations 

I 

11 

Breeds 

111 

IV 

V 

VI 

A 

20 

22 

20 

22 

24 

24 

B 

26 - 

24 

20 

24 

22 

22 

C 

24 . 

22 

22 

26 

24 

22 

D 

26 

28 

24 

30 

26 

22 


Ans. F=2 - 23 for breeds 
and F=5 03 for rations 
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Q. 7. Below are given the marks obtained by 12 candidates,' 
appeared in a P.S.C. interview held on 4 current topics by a 
committee of 3 experts, each handling four candidates for the 
different current topics. Analyse the data and test for the homoge 
neity between the interviewers and the four current topics ? 


Interviewers 



Topics 



I 

II 

III 

IV 

A 

8.0 

26.0 

15.0 

21.0 

B 

22.0 

25.0 

24 0 

36.0 

C 

11.0 

27.0 

13.0 

21.0 


A ns. F= 5.76 for interviewers 
and F— 6.2 for topics 

Q. 8. Following table gives the number af pods/plant for 
20 pea-pl nts of 4 different varieties v l5 v 2 , v 3 , v 4 . Test the signi- 
ficance of the difference between the varieties and the plants ? 


Variety Plants 



I 

II 

III 

IV 

V 

V! 

23 

26 

30 

22 

25 

v a 

29 

23 

27 

30 

26 

V 3 

21 

23 

26 

24 . 

25 

V 4 

25 

22 

29 

27 

28 


Ans. F=2'03 for plants 
and F=1.67 for varieties 
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Latin Square Design ( L. S. D.) 


Description: — In R.B.D , the whole experimental area is divided 
into homogeneous blocks and randomization is kept restricted 
within the blocks i.e. subject to one restriction only. While in L.S.D. 
the experimental-area is divided into rows and columns such that 
the no. of rows and columns is equal and each treatment (denoted 
by Lattin letters) occurs only once in a row and column. Thus the 
randomization is subject to 2 restrictions. This arrangement removes 
(eliminates) the variation between the rows and the columns from 
that of within and reduces the experimental-error considerably. 

Applications In the field experimentation, it is used : — 

(i) When the fertility-gradient is in one direction but not 
known. 

(ii) When the fertility gradient is in two directions at right- 
angles. 

In fact, the L.S.D. can be applied in all the cases where the 
variation in the experimental-material is from two orthogonal sources. 
1* is used in industry, animal-husbandary, piggery, green house, 
biological and social sciences where it is desired to control simulta- 
neously two factors contributing to the experimental-error. 

Relative merits and Demerits of L.S.D. over a R.B.D. : — 
Although the L. S. D. is an improvement over the R.B.D. but there 
are situations where R. B. D. is used instead of L.S.D. : — 

(i) R.B.D. can be used with any no. of treatments and their 
replications, but L.S.D. is a suitable design for the no. of treatments 
from 5 to 12. 

(ii) There is no restriction on the no. of replications in a 
R.B.D. while the no. of treatments and replications should be equal 
in a L S. D. This restriction putr a limit on its applications. 

(iii) In the case of missing-plots, the statistical-analysis is simple 
in a R.B.D. but in L.S.D. it becomes some what complex and 
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especially when the missing units are several. 

(iv) In the field, the R.B.D. is easier to manage than a L.S D. 
As it can be performed equally in a rectangular or squre field or a 
field of any other shape, while the L.S.D. necessitates approximately 
a square field. 

The merits of the L. S. D. lie in the fact that it controls simul- 
taneously two factors contributing to the experimental error. Thus 
the L. S. D. is a more suitable design than a R. B. D. where the 
fertility gradient is in to 2 directions at right angles or in one unknown 
direction. 

Replications: — rn a L S. D. the no. of replications must be 
equal to the no. of tieatments. Due to this fact, the design is not 
suitable for a large no. of treatments and is rarely used fcr the no. of 
treatments greater than 12. On the ground of error d. f., it is not 
suitable for fewer treatments also i. e. i 5. The error— d. f. for a 
2x2, 3x3 and 4x4 L. S. D. are 0, 2 and 6 respectively. According 
to Proff. R. A. Fisher, it is most suitable for the no. of treatments 
from 5 to 8 and can be used up to 12. 

Randomization: — (a) In the randomization of a KxK, L. S. D. 
(K<4), the first step is to select a reduced L. S. (Standard L. S.) 
from the set of reduced L. squares. A standard L. S. is one which 
has an alpha-betical order of letters in the first row and first 
column. 

For the 2 x 2, 3 x 3, and 4 x 4, L. square, the standard (reduced) 
L. Squares are— 1, 1, 4 and they are — 

. For 2x2 For 3x3 


A 

B 


A 

B 

C 

B 

A 


B 

C 

A 



C 

A 

B 
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(i) 


A 

B 

C 

• D 

B 

C 

D 

A 

C 

D 

A 

B 

D 

A 

B 

C 

1 


(iii) 


> 

CD 

C 1 D 

i 

B 

A 

D 1 C 

C 

D 

A 

1 

B 

D 

c 

B 

A 


(») 


A 

CD 

O 

\ 

D 

B 

A ! D 

C 

C 

D B 

A 

D 

C A 

B 


(iv) 


A 

B 

C 

D 

B 

D 

A 

C 

c 

A 

D 

B 

D 

C 

B 

A 


After the selection of a reduced L. S., the second step is 
to ramdomize the order of all the ‘K’ columns and the last (K — 1) 
rows. 

(b) For the randomization of a Kx K (K>5) L. S., construct 
any L. S. in the first step and then randomize the order of its all 
rows, columns and treatments (letters). The detailed procedure for 
a 5 x 5 L. S. is given below — 
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(i) Construct a# 5x5 L. S,, let it be. 
Row No. 

1 

2 

3 


4 

5 

(ii) Randomize the order of the above L. S. Let the random 
numbers be 4, 3, 2, 5 and 1. According to these numbers, the 
fourth row of the above L. S. will be written in place of first row 
third in place of second, second in place of third, 5th in place o t 
and first in place of 5th. Arranging them in this order, we have. 

Col. No j 2 3 4 5 

D E B A C 

I J 

c d a e d 

i 

I 

B A E C D 

I 

i_ 

E C | D ! B A 

I __ ' 

A B C i D E 

' ) I 

l ' " " 

(iii) Randomize the order of the columns in the order of the 
random numbers 3, 1, 4, 5. 2 (a fresh set of random numbers). Then 
we obtain — 
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B 

D 

A 

C 

E 

•- 





A 

C 

E 

B 

D 

E 

B 

C 

D 

A 

D 

E 

B 

A 

C 

C 

A 

D 

E 

B 

t 


(iv) Finally, randomize the order of Latin- letters. Let the set 
of a freshly selected random numbers be 2, I, 5, 3 and 4. According 
to this set, B will be written for A, A for B, E for C, C for D and 
D for E. Reshuffling the letters in this order, we get - - 


A 

c 

B 

E 

D 

B 

E 

D 

A 

C 

D 

A 

E 

C 



D 

A 

B 

E 

r- 

B 

i 

C 

1 

D 

A 


Layout: — In field experiments, L. S. is generally performed on 
a square or nearly square area but it is not necessary. As the object 
of this design is to control variation in two directions, it can be applied 
when the plots are ia a row or set of rows. In this case, the compact 
blocks are supposed to be the rows of a L. S. and the order of plots 
within the blocks the columns. This type of arrangement is usually 
done in a green house experimentation. 

Forandatioo of Noll Hypothesis:— In order to compare the 
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treatments in a L. S. D., we set up the null hypothesis -that the \ data 
is homogeneous i . e , the variation is the data in due to chance (error) 
(fidy. 

Statistical Analysis: — To reduce the bulk of calculations, 
generalh we take the deviations ol the yields from some appropriate 
origin in a ‘AT k AT’ L. S. D. and then obtain the totals for rows, 
columns nd treatments, whichare usually denoted by ( r R x , 7?^..., 7?^ 

( C ly Cj and (7\, T 2 ...T^^ respectively. Then the sum of 

squares (S S.) for rows, columns and treatments are computed by 
the following formulae — 

T. S. — C. F , where i i ~ k and C. F. - ^ - 

i J K ~ 

-S( say) (K 1 N) 

stands for the yield of the plot in ,th row and ,th. 

Column for specified treatment written above it by Lattin-letter. 

S S. due to rows=2 ^ -C. F. , 
i K 

.=si ( sa y) 

Where 7? t -*starids for the total of ,th row. 

C - 

S. S. due to columns -2 -C, F. , 

j K 

= S 2 (say) 

Where Cj -> stands for the total of ,th column. 

ji 

S. S. due to treatments^- 2 ' C. F , 

i K 

-S A (say) 

Where T t -> stand for the total of ,th. (i— I, 2, k) 

treatment. 

S. S. due to error=T. S. S —S S. due to (rows f columns 

+ treatment) 

— S—(S 1 -\-S 2 -\- S 9 ) 

=s t (say) 
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Now we arrive at the following analysis of variance table : — 



If the value of F corre'ponding to the treatments comes out 
to be significant at 5% or at 1% level of significance, we require a 
further analysis of the treatments to decide the significance of 
the difference between the two treatments. For this purpose, we compute 
the S.E. of the difference of the treatment-means and the C.D. at 
the same level, as given below : — «* 

S E. of the difference between two treatment means 

= a/ 2V E and 

V K 

(C. D.) ~(S. E. of the difference) x/ferror d.f.) 

«% «% 

Inference : — Any significant value of F leads to the conclusion 
that the corresponding factor has a significant effect on the plot- 
yield. If the treatments differ significantly then their means are 
arranged in decreasing order of their magnitudes. The treatment 
mean- pairs, for which the differences are less than the C.D. are 
underlined by a bar indicating that their differences are not significant 
or the two do not differ significantly from each other at the given 
level of significance. 
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Finally, the conclusions and the comments on them are 
summarized in the form of a report. 

Exp. (1) Carryout the analysis of the following L.S.D. 


TO — 

76 

70 

8b 

90 

A 

B 

C 

D 

E 

70 

90 

80 

! o 

00 

1 

16 

B 

C 

D’ 

E 

A 

60 

50 

90 

80 

90 

C 

D 

E 

A 

B 

50 

60 

80 

50 

70 

D 

E 

A 

B 

C 

80 

; 

| 90 

50 

70 

60" 

E 

A 

B 

C 

D 


(B. A. Vikram, 1961) 

Solution 

Ho : The data is homogeneous. 

For the convenience of calculations, we shift the origin to 

y—70 and prepare the following table to compute the sum of 
squares 
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Treat. 0 —20 10 -30 j 5( 

(0) (100) (100) j (900) 1 (2500) 



56 


Experimental Designs 




T.S.S = SS 2 v(j— C.F.= 5300—4.0= 5296.0 


5'.5'. due to rows =2 — C.F. = 

i 5 


100 f 400 + 400 +1^00+0 
5 


= — j- -4.0= 500—4.0= 496.0 

5 .5. due to col n s.= s O'* _ c F = 1600 j 1 00 + 400+ 100+ 100 

j 5 5 

- 4.0 

== — —4.0= 460 — 4.0- 456.0 

55. due to treat. _ CF= 0+ 40 OjJOO+900 j: 2500 _ 

i ^ 5 


5 -4.0=780-4 0=776.0 

5.5. cue to error = +5 S. — 5.5. due to (rows+cols. + treat.) 

= 5296- (496+456+776)= 5296- 1728 
= 3568 

Now we arrive at the following A.V.T. : — 
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The calculated values of F corresponding to rows, columns and 
treatments indicate that none of them is significant at 5% level of 
significance. ,Thus the L.S.D. provides no improvement over a 
C.R.D. in this case. 

Inference : — The data is homogeneous. 

Exp. 2. A varietal trial was conducted on wheat with four 
varieties in a L.S.D. The plan of the experiment and the per plot 
yield are given below : — 


C 25 

B 23 

A 20 

D 20 

A 19 

D 19 

C 21 

B 18 

B 19 

A 14 

D 17 

C 20 

D 17 

C 20 

B 21 

A 15 


Analyse the data and interpret the results ? 

(M. Sc. Ag. Agra, 1961) 

Solution 

The data is homogeneous. 

To carryout the analysis, we take the deviations from y- 20 
for our convenience in calculations and arrange the data in the 
following tabular form : — 


r X Co, n 
Row. ^ 

1 

2 

3 

4 

Totals 

R 

(Totals — R) 2 


c 

i B 

A 

D 

1 8 


i 

5 

3 

0 

0 


64 


(25) 

1 (5) 

(0) 

(0) 

1(34) 



A 

1 D 

c 1 

B 1 



2 

— 1 

-1 

1 1 

—2 ! 

-3 * 

9 


(1) 

(1) 

(1) 1 

(4) ! 

(7) 



B 

A 

D 

C 



3 

— 1 

--6 

—3 

0 

—10 

100 


(1) 

(36) 

(9) 

(0) 

(46) 



D 

C ' 

B 

A 



4 

— 3 

0 

1 

—5 

—7 

49 


(9) 

(0) 

(1) 

(25) 

(35) 1 


Totals=--=C 

0 

—4 

-i | 

-7 1 

-12 G 

144 


(36) 

(46) | 

(ii)l 

(29)1 

(122) 


(Totals— C) a 

0 

16 

i 

49 | 

144 | 



A 

b 

c 

D 



Variety 

— 12 

1 

6 

—7 




(144) 

(1* 

(36> 

)40) 



Mean 

17 120-25 

1 

21*5 

18-25 
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r F _G 2 _144_ oo 

76 “ 9 -° 

7\5.5.=Z2y 8 „-C.F.= 122-9.0= 113.0 

5.5. due to rows =zff--C.F- 64+9 + 100 + 49 -9.0 

=-l?i -9.0=55.5-9.0=46.5 

5.5. due tocol n s.=2, C f-C.F. =°- +16 -^ 1 -— - -9.0 
4 4 

66 


=52 — 9.0=16.5-9.0=7-5 
4 


5 5. due to varieties = 


t * 4 +r* +r* r +r 8 


-C.F. 


_ 144+ M 36+49 

4 


230 


-9.0=57.5-9.0=48.5 


5.5. due to error = r.5.5.— 5.5. due to (rows + Col n s. 

+ varieties) 

= 1 13— (46.5+7.5+48.5)= 1 13—101.5 
= 10.5 

Now we arrive at the following A.V.T. : — 


Sour c of 
variation 

| D. F 

S. S. 

M. S. S. 

1 F. cal. 

F 05 

F 01 

Varieties 

3 

48-5 

16-167 

9-23* 

4-76 

9-78 

rows 

3 

46-5 

15-5 

8-85* 

99 

99 

columns 

3 

7-5 

2-5 

1-42 

99 

» 

Error 

6 

10-5 

1-75 

— 

— 

> 

Totals 

15 

B 

— 



— 


The calculated values of F s corresponding to varieties and 

rows indicate that the varieties and ihe rows are significantly different 
at 5% level of significance- The insignificant value of F corresponding 
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to columns indicates that the L.S.D. is not an improvement ovfcr 
R.B.O. in this case. 


Now to (Jetermini which of the variety-pairs differ significantly, 


we require the computations of S.E. of the difference of means of 
varieties and the C.D. 


S.E. of mean difference = 




yj 0.875 


= 0.935 

(C. D ), 0/ —{S.E. of difference) xt (6) 

3 /o .05 

= 0.935x2.447=2.879 
S2.29 

Now, we arrange the variety-means in their decreasing order 
of magnitudes : 

Variety : C B D A 

Mean: 21.50 20.25 18.25 17.00 The varieties 


Which do not differ significantly from each other, are 
under- lined by a bar. 

Conclusion The varieties have significant effect on yield at 
5% level and the variety C has max. yielding capacity followed 
by variety B but it does not differ significantly from B. The variety 
C differs significantly from each of D and A. The variety B also 
does not differ significantly from D but differs significantly from 
A. We also note that the varieties D and A do not differ significantly 
as regards their average yi. Id. 
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Exercise III 

Q. 1 . (a) Compare the advantages and disadvantages of the 
R.B.D. and L.S.D. in field trials. (M. Sc. Ag. Agra, I960, 62) 

(b) In a trial of five varieties of wheat, A, B, C, D and E laid 
out in a Latin square, the following yields (in oz./plot.) were 
obtained 


B 

E 

C 

A 

D 

36 

56 

164 

120 

8Q 

E 

D 

B 

C 

A 

66 

64 

76 

178 

60 

C 

A 

D 

B 

E 

118 

76 

70 

64 

34 

A 

C 

E 

D 

B 

58 

146 

66 

48 

40 

D 

B 

A 

E 

C 

60 

16 

66 

66 

72 

Analyse the data and state 

vour conculasions ? 

(Hint : take deviations from 76' 



Ans. (b) F= 

Vj 

"F -2710, 
V E 

F- 

V R 

V E 

=4-44 and 

F~- 

V C 

-TT- =5‘18 

V E 




Q. 2. Five 

varieties A, B, C, D and E of millet were tested 

in a 5x5 L S.D. 

The layout and the 

yield 

in Ibs/plot are given 

below. Analyse the data and 

test for the 

variation between the 

different varieties ?, 




A 

B 

E 

C 

D 

60 

18 

28 

82 

40 

B 

A 

C 

D 

E 

38 

30 

89 

32 

33 

E 

C 

D 

B 

A 

17 

59 

35 

32 

38 

C 

D 

A 

E 

B 

73 

24 

29 

33 

20 

D 

E 

B 

A 

C 

30 

33 

8 

33 

36 


(Hint : take deviations from 38) 

Ans. Vy, =15317. V Q =1237, ^ =258 5 and 


V £ =1492 
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Q. 3 (a) What is a L.S.D. ? Give the assumptions ind 
applications of a L.S.D. in field experimentations ? 

(b)A4x4L.S. was laid out to test the effects of various 
fertilizers on the yield of potatoes. Here is the field-plan with the 
plot yield f bushels/acre). 

The letters specify the treatments. Analyse the data and draw 
your conclusions ? 


\Col. 

\ 

Row \ 

1 

1 

A 

j 423 

2 

B 

428 

3 

C 

452 

4 

D 

390 


B 

D 

A 

C 

z 

425 

380 

420 

440 


C 

A 

D 

B 

0 

460 

1 

414 

375 

425 

A 

D 

C 

B 

A 

4 

380 

450 

430. 

412 


(Hint : Shift the origin to 419) 

Ans. Vj ' 3312*5, V Q -20167, V R -35 833 and 

V E =41 75 

Q. 4. At a biological research centre, a research, assistant 
fed up the rabbits of four different breeds for a month with 4 types 
of rations (A, B, C & D) and noted the gains in their weights 
in ounces. He presented the results of weights in a 4 x 4 L.S. The 
layout and the gains in weights are given in the following table 
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A 

B 

C 

D ' 

50 

' 

12-0 

1 

131 * 

80 

B 

D 

A 

C 

13-5 

10-5 

14-0 

120 

c 

A 

D 

' 

B 

140 

8-5 

100 

130 

D 

C 

B 

A 

7-5 

15-5 

111 

11*5 


Analyse the data and write a brief repoit ? 

(Hint : take the deviations from 1T2) 

Ans. V r =6 09, V q =313, Vj = 1918 and V £ = 5 68 


Q. 5. The following data shows the results of a varietal trial 
on wheat in a Latin Square. The varieties are designated A, B, C, 
D, E and F. The yield of different plots are written below the 
treatments. 


Analyse the data and give your inferences ? 


Plan 

and yields 

of a varietal trial on wheat in 

E 

B ' 

F 

A 

C 

D 

433 

327 

452 

190 

304 

216 

B 

C 

D 

E 

F 

A 

289 

275 

215 

288 

371 

82 

A 

E 

C 

B 

D 

F 

184 

281 

283 

222 

134 

446 

F 

D 

E 

C 

A 

B 

420 

248 

305 

239 

123 

184 

D 

A 

B 

F 

E 

C 

252 

232 

. 211 

417 

394 

266 

C 

F 

A 

D 

B 

E 

300 

305 

59 

166 

126 

220 


Ans. V R =10839.71, V c =4893.45 
V T =49635.984 and V £ =1527.0515 
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Q. 6. Carry out the analyis of variance of the following data 
in a 3x3 Latin Square ? 


\Col: 

\ 1 2 3 

Row \ 



Q. 7. Describe an experiment for comparing the effect of 
5 different feeds on milk-yield of dairy cows using a Latin Square 
Design. Give full details of plan and conduct of the experiment 
and explain the method of analysis of results ? 

(M. Sc. Ag. Agra, 1956) 

Hint : — Let the 5 diets to be compared b-- A, B, C, D and 
E. For conducting the experiment in a 5x5 L. S., we require 
in all 25 dairy cows as our experimental material. To satisfy the 
conditions of L S., these 25 cows should be different with respect to 
two factors of variation e.g. the breed and age- group. Let the 5 breeds 
be b x , b a . b 3 , b 4 and b 6 and the five ase-groups be 3-5, 5-7, 7-9, 
9-1 1 & 11 and over years. There are 5 cows in each age-group 
and of each bread such that no two cows of the same breed are of 
the same age-group. Each of the fi\e diets will be given to 5 cows 
such that no two cows receding the same diet are of the same breed 
and of the same age-group. The classification of cows with respect 
to age-group is represented here by columns and breeds by rows and 
the diets A, B, C, D, and E are assigned in such a way that each 
occurs once in a row and a column. Thus the arrange mint will be 
as follows 
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\Col. 

\ 

Row \ 

1 

3-5 

2 

5-7 

3 

7-9 

D 

5 

11 & 

over 

L 

A 

B 

E 1 

c 

D 

b i 

1 

2 

3 1 

4 

5 

L 

B 

A 

C 

t 

D 

E 

b 2 

6 

7 

8 

9 

10 


E 

C 

D 

B 

A 


11 

12 

13 

14 

15 

L 

C 

D 

A 

A 

B 

b 4 

16 

17 

18 

19 

20 

h 

D 

E 

B 

I 

C 


21 

J 

22 

l 

23 


25 


For randomization and Statistical analysis see theory. 

» 

The figures written below the letters denote the cow-number. 

















CHAPTER V‘ 


Analysis of Covariance 


In the previous chapters, we have seen how the experimental 
error is reduced by increasing the no. of replications, refining the 
experimental technique ai.d grouping the experimental-material into 
homogenous groups (blocks). There is yet one more method which 
is found very useful in reducing the experimental-error. This method 
eliminates the contribution made by the uncontrolled factors (related 
to the yield) to the experimental error. This elimination is possible 
only, when the uncontrolled factors can be measured quantatively. 
Thus the error can be controlled by measuring such characters 
(factors) in addition to the yields, the characteristic statistical- tool 
for this control is the Covaviance analysis. The Convar/'ance-analysis 
makes the use of the regression of yield on the related uncontrolled- 
factor for making a correction in the estimates of treatment- differences 
and experimental error. This control of error by analysis of cova- 
riance-technique is called the Statistical control of error. According 
to -Proff. ‘R. A. Fisher’, “ analysis of covariance combines the 
advantages and reconciles the requirements of the ivco very widely and 
applicable procedures known as regression and analysis of variance." 

In the field experimentation, the no. of plants, tillers, prunings, 
and age ol the crop, etc. are the extranious sources of variation 
which contribute to the experimental error. The effect of such a 
character vary from plot to plot randomly. The variate ‘jc* associa- 
ted with the observation of the uncontrolled factor is called the 
‘Concomitant or ancillary variate.” This variate should be such 
that it remains unaffected by the treatments, other wise the results 
will be misleading. 
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Applications: — 

(i) One of the most important applications of the analysis of 
covariance is the control of errors that arise at random, known as 
‘ statistical control of error'. 

(ii) Covariance- analysis can be applied in sorting out the 
regression effect. 

(iii) The analysis of covariance also provides a unique method 
to test the significance of the difference between the two or more 
regression coefficients. 

(iv) Covariance analysis can successfully be used to carry out 
the analysis of the incomplete data (when one or more units are 
missing in. the data). 

(v) The technique of covariance- anah sis is more effective in 
reducing the experimental error than the process of grouping the 
experimental material into homogeneous groups as the former 
provides more error d. f. than the later. 

Assumpions: — The assumptions involved in the analysis of 
covariance are — 

(i) The effects of different factors i. e. treatments, groups and 
regression are additive. 

(ii) Apart from the regression-effect, the yields are distributed 
normally and independently. 

(iii) The concomitant variate ‘x' is not affected by the treat- 
ment. 


Now, we give the procedure of statistical control of error 
using a single concomitant variate in the case of C. R. D., R. B. D, 
and L. S. D. 
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(I) Case of C.R.D. 

Suppose, we have got ‘v* treatments, the ,th treatment Being 
replicated ‘r<’ times (i=I, 2,...v) and the variate y denotes the yield 
while ‘x* the concomitant variate. The data from the original layout 
can be arranged in the following tabular form — 


\Treatments 
Replication \ 

1 

1 

2 



V 

y 

1 X 

i y 

X 


y 

X 

1 

yu 

*11 

y-n 

*21 


y v i 

Xvi 

2 

y u 

*12 

3^22 

*22 


y vi 

*t?2 

r. 

ysi | 

*i r i 

y»r* 1 

x 2 r 2 

yjv 

X V K V 

Totals 

T-Uv) 


^2(V) 

^2(jr) 

... . 

T v (y) 

T v {.r) 

G {V) 

G(x) 


Total no. of pairs=A^=Sr, 

where T lW and T, U) stand for ,th treatment totals for y and V 


variates respectively. Further steps are as follows — 

(il Compute the S. S. x for treatments and error by the following 
formulae — 

[a] T. S. S X =2E* 2 „-C. F. x where C. F. x =^ f 

i j " 

[b] S. S , (treat.) =2 F. x# , 

i r ‘ 

= A l (say) 

and [c] S. S. x (error) — ?. S. S. x —S. S. r (treat.) 
r- A (say) 

ii) Compute t’ e S. S. v for treatment and error by the following 
formulae — 

[a] T. S. S.„= 22 p„-C. F. v , where C. F. y = 

i j " 

[b] S. S. v (treat.)=2— ^ — C. F v , 

i r > 

=^i (>ay) 

and [c] S. S v (error) =■ T. S. S V —S. S v (treat.) 

= 5 (say) 
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(iii) Compute the sum of products S. P „ for treatment and 
error by the following formulae — 

t 

[a] T. S. P.^TLx iS y ( ,-C. F xv , 

i j 

where C. F. xv = Qi&S- 

[b] 5. P. Itf (treat.)=2^ V '? , -C. F. x ' , 

i r < 

— C, (say) 

and [c] S. P av (error ^7. 5 P JV ~S. P IV (treat.) 

= C (say) 

(iv) Summarize the above results in the following tabular 
from — 


Source of 
variation 

D. F. 

s. s. x 

S. S v 

S. P , av 

Reg. coefft. 

'b' 

Treatment 

V— l = v A 

A, 


Q 

— 

Error 

N— v=v 

A 

B 

c 

CIA 

Totals 

=(r+£) 

B 

(A 1 +A) 

(B t +B) 

(Q-bC) 

— 


(v) Test the significance of the regression coefficient */»’= C/A in 
the error line at 5% by computing the statistic— 


F(l, v— 1) x(v— 1) If this cal. F comes out to 

be significant, the concomitant variate has an effect on the yield 
and covariance-analysis will be used to eliminate its effect. On the 
other hand, if F is not significant, the consideration of the 
concomitant variate is useless and the analysis of variance will bp, 
used to compare the treatments. 

(vi) In case when F is significant, obtain the adjusted S. S „ in 
the following way- 

fa] adjusted T. S. 5,=(B 1 +B)-^±Oi=S' (say) 

[b] adjusted S. S. v (<itot)=B—C 2 /A =S t (say) 
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[c] adjusted S. S, v (treat.)=adjusted T. S. S'.,,— adjusted S. S. v 


(errpr) 


=Si(say) 

« 

(vii) Reduce by one, the total and error d. f. as they are subject 
to adjuitment while the d. f. of treatment keep unchanged. 

(viii) Prepare the following A. V. T.— 


Source of 
variation 

D. F. 

S.S 

adj 

M. S. S. 

j Fca.1 

F tab. at. 

5% 1 1% 

Treatment 

Error 

Vl 

V-i; 

1 

1 s. 

i v * T 

! s, 

S t .y~r r E 

1 

1 

V 1 

JL\w'>v' e 

V E T E 

1 


Totals 

N-2 

5 

— 

— 

— 

— 


(ix) If the cal. F in the above table comes out to be significant, 
compute the S. E. of the difference between the two treatment-means 
and C D. by the following formulae — 


S. E . of the difference-— J F'^ -‘--d ~ j > 

where are the means of the t th atyi ,th treatment for the 

variate x. 

(C. D.) = (S. E. of difference) x / (v — 1) 

•05 -05 


• (x) Adjust the mean-yields by the following formulae — 
y'i=yi-b(x,~s ) , where b=CIA, 


T 

y' ( is the adjusted mean yield for ,th treatment and y<=-jp 1 , 



70 


Experimental Designs 


Arrange the adjusted mean-yields in the descending order of 

their magnitudes and under line those pairs by a bar which do not 

# 

differ significantly from each other. ° 

(xi) Finally summarize the results obtained and comment on 
them if any, in the form of a report. 

(2) Case of R. B. D. 

Suppose we have got V treatments each replicated V times. The 
variate 'y' denotes the yield and V the concomitant variate. The 
data from the original layout can be arranged in the following 
tabular form — 


Blocks \^ eal 

no 

■ 

B^gj 

mm 

Totals 

BB1 

IBB 

K5m 

Era 

1 B(v) 

J_ B( X ) 

«!(.) 

1 

-Vn 

*11 

J y 21 

| *21 

1 


B 

j -®l<!/> 


2 


B 

^22 


■ 

y v 2 

**2 

Bn.) | 

&2U) 

Oi 

• 

•j 

i 

: 

: 

i 

: 

; 

: 

' 

r | Jhr 

*lr 


*2 r * 


y vr 

**r 

Briv) 

&r(x) 


Totals 

^l(v) 

^1 (jr) 

7*2 (y) 

^2(i) 


r viv) 

I 

T V (x) 

G(y) 

■ 

G(x) 



where B J(V) and B iiT) stand for the ,th block totals for the 
variates *y’ and ‘x* respectively. 
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Further steps are as follows— 

(i) Compute the S a for treatments, blocks and error as given 
below— 

v r r2 

[a] T. S. S . x — SS v*«-C. F.^ where C. F. X =—J9 and N=vr 

ij N 

P>] S.S.. (treat. )=S 7 ’> 1 -C. F. X =A, (say) , 
i 

D2 1 

[c] 5. S.. (blocks) - 1— -C. F. X =A, (say) , 

j 

and [d] S. S x (error) = F. S. S. x —S. S. x (treat. +blocks) 

= T. S. S- x ~(A 1 A-A 2 )— A (say) 

(ii) Compute the S. S. y for treatments, blocks and error ^by the 
formulae— 

[a] T. S. S F. y , where C. F.„--V . 

i j , 

[b] 5. S. y (treat.) -C. F. y ~B l (say) , 

i r 

[c] S. S.„ (blocks) - — C. F. y =B 2 (say) , 

j V 

and [d] S. S. y (error) = 7\ S. S. y - S. S. y (treat. + blocks) 

= T. S. S. v - (B l +B. 1 )=BX^y) 

(iii) Compute S. P. xy for treatments, blocks and error by the 
following formulae— 

[a] T. S. P '■ x „—’EZx, ) y, ) —C. F., y , where C. , 

i j 

[b] S. P. Ty (treat.)- -C. F.^-Q (say) , 

i r 

M S. P. xy (blocks) =Z~^”-C. F x ,=C 2 (say) , 

j 

and [d] 5. (error) =F. 5. P. xy —S. P. xy (treat. -f error) 

=J. S. F. w — (C 2 -+ C 2 )=C (say) 
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(iv) Arrange the S. S. and S. P. for treatment and error only 
in the following tabular form— , 


Source of 
variation 

D.F. 


S. S. v 

P'Xy 

Treatment 

Error 

V — 1 = Vj 

(r— l)(v— 1); v 

A\ 

A 

B , ' 

B 

Q 

C 

Totals^ 

(Treat. + Error) 

Vi+V 

(A t +A) 


(Cx+C) 


Note:— The remaining steps are the same as in the case of 
C. R. D. except the formulae for S. E. of the difference between the 
two treatment means ( 4 th and jth). Here, the S. E. of the difference 

between the two treatment means is given by J V' ^ 

Obviously, the S. E. of the difference will be different for different 
pairs of treatment-means. For most of the practical purposes, it is 
sufficient to use the average standard error to compare the treatment 
means. It is given by the formula— 


Average S. E. of difference 
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(3) Case of L. S. D. 

• Suppose we have a ‘KxK* Latin Square given below. Whose 
£, C y are l JC treatments, *y* denotes the yield and V 

the concomitant variate. R t{y) and R tix) stand for the t th row totals 
for the variates l y' and *x' respectively and similar meani cs are 
attached with C jly) and C J(X) for ; th column. The symbols T A 

^ (7) ’ 

T A etc denote the treatment totals for the treatment A e'c. 

"(ar) 


Col." 

\ 

1 

1 

2 i 

1 

K j 

Totals 

Row\ 

Y, X 

Y, X 

l 

Y. X 

-R 


A 

B 

1 

1 D 


1 



1 

XlW, Rllx) 


Tii' *11 

T21* *21 

yKi. 

1 



c 

A 

B 


2 



1 



/l2> *12 

T22» -*22 

j yKv* K t 

^2<u)» ^2(x) 


1 



2 

• 



} 


ft 

ft 

• 

ft 





ft 

# 

I 





• 

• 



K 

D 


c . 


K 







y \K, X \K 

y 2K, x 2K 


y KK, X KK 

R„ R r 

“(»)» *MX> 

Totals 






=C 

Q(v)> 

^*2(V)» ^2(x) 


c r c v 

A (i/) A (x) 

^(V)> &U) 


A 

B 


K 


Treat. 

- 






Totals 

t a 

A (V)* 

t r 

"(»)* 


A (i0 



t a 

A ix) 

t b 

thsi 


t k 

A (x> 
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Further Steps are as follows— 

(i) Compute the S. S x for treatments, rows, columns and error 
by the following formulae— 

KK 

(a) T. S. S*=22 x\i-C. F x , where C. F.=-^Wand 

ij N 

T* a 

(b) S. S x (treat.) =2 ~^—C. F x =A 3 (say) 

(c) S. S x (rows) =? R £ x) —C. F x =A 2 (say) 

(d) S. S x (columns) F X =A 3 (say) 

J A 

(e) S. S x (error) =T. S S x -S. S T (treat. + rows+columns) 

= T. S. S x - (^ + A 2 +A 3 )= A (say ) 

(ii) Compute the S. S y for treaments, rows, columns and error 
in the following way — 

(a) T. S. S y =2SJ' a o-C. F v , where C. ^ = — 

ij 

T* A 

(b) 5. S v (treat F y =B 1 (say) 

n 2 

(c) S. S v (rows) =2 — C. F v ^B i (say) 

i A 

(d) S. Sy (columns) £jp-C. F V ^B 3 (say) 

(e) S. Sy (error) = T- S. S v -S. S„ (treat.+rows+columns) 

= J. 5. Sy-M+Bt+B^B (say) 

(iii) Compute the S. P xv for treatments, rows, columns and error 
by the following formulae— 

G G 

(a) T. S. P X y-I,I.x ii y ij —C F ry , where C. F X y=—^^- 

U 

T T 

(b) S. P*y (treat.)=2 — — — — C. F a y=Ct (say) 

(c) S. P n (rows) F 1V =C 2 (say) 

(d) S Pyy (columns ,)= S-SsslSuP-C. F XV =C 3 (say) 

(e) S. P X y (error )=T. S. P^-S. P 3V ftreat.+rows+columns) 

=T. S. P X y—(C 1 +C 2 +C 3 )=C (say) 
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(iv) Arrange the S. S. and S P. for treatments and error only 

in the following t ibular form— 

- ■ ■ - - 


Source of 

Variance 

D. F. 

s.s x 

s.s y 

S. P xv 

Treat. 

K- l= Vl 

^1 

Bi 

Q 

Error 

(AT— 1)(AT— 2)= V 

A 

B 

c 

Totals— 

(Treatment (-Error) 

Vj+V 

(At+A) 

(B.+B) 

(Q+C) 


Note— The remaining steps are the same as in the case of 
C. R. D. except the formula for S. E. of the difference between the 
two treatment-means ( { th, ,th) : Here, the S E . of the difference 
between the two treatment-means is given by the formula— 

I t/' r 2 , (Xi—Xj) 2 1 

S. E. of the difference= J ^ E ~a J 


Obviously, the S E. of the difference for different pairs of 
treatment-means will be of different magnitudes. But, for most of the 
practical purposrs, it is sufficient to use the average S E. to compare 
the treatment-means. It is given by the formula — 

average S. E. of difference — 



Exp. No. (1) In a feeding experiment, the rabbits were fed 
with six types of diets A, B, C, /), E and F The following table gives 
the gains in weight ( y grams) and the quantity intake ( x caloric 
units) — 


A 


1 

B i 

( 

y 

D 

E 

F 

X 

y 

X 

y 

X 

X 

y 

X 

y 

X 

y 

»io 

30 

7 

42 

13 

2-3 

6 

40 

9 

3-3 

12 

4-2 

8 

3-5 

6 

4-8 

10 

2-5 

9 

31 

14 

1-7 

8 

4-6 

15 

20 

5 

51 

12 

23 

7 

3-5 

16 

1-5 

6 

55 

11 

3-2 

9 

35 

7 

3 7 

10 

29 

8 

41 

j 11 

3-8 

9 

38 

8 

39 

9 

2-7 

5 

50 

11 

2-9 

14 

3-4 




Test, whether the gains are materially affected by the quantity of food in take and six types of diets differ 
significantly ? 
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S. S. for x . 

C. F r =^>- =. ( — ^*=270x -7 5 

T. S. C. F x =2959-2707‘5= 251*5 

ij 

S. S x (treat, or diet) = ^ -C. F* 

_(5 3)-+(35) a +- + ( 51)» 27Q7 . S 


13969 
= 5 

=863 


-2707-5 = 2793-8 ==2707*5 


S'. .S’, (error) -T. S. S r —S. S x due to treat. 
= 251-5-86-3=165*2 

S. S. for y : 


G\ 


(104) a 


C F - ~ "" = v *y~L = 360*53 
v N 30 

7\ 5. S> S2.v-„-C. F v =389-46- 360-53 

i j 

= 28-93 

5. S„ (treat.) =^ -C. F„ 

_( 15-5)*+(21*5)*+...+(21-5)* 
5 

=374-30—360-53=13 77 
5. F, (error) = 7*. 5. S y — 5. S v due to treat. 
=28*93— 13-77= 15-16 

S. P. for x and y : 

C. F nS =^S3Ss) = ? - 85 - ^ - 104 =988 

T. S. P xl ,= 22x w y«— C. F xv 

j i 

=918-7— 988=— 69*3 


-360-53= 

-360-53 


S P n (treat.) = 




— C. F» 


i r t 

(53x15*5)+. .. + '51x21*5) 
5 


-988 


4826 5 


_ . — 988=965*3 — 988= — 227 

S. Pm (error) =T. S. P n —S. P®, due to treat. 
= — 69‘3— ( — 22*7)— —46*6 
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Now we prepare the following table for unadjusted S. S and 
S. P. for treatment and error only — 


Source of 
Variation 


S.S X 

S Sy 

S.Pyy 

Reg. coefft 

(b) 



86-3 

13-77 

-22-7 

— 



165-2 

1516 

-46-6 

M6 ' 6 -2821 
165-2“ 2821 

Totals = 

(T+E) 

j ' 

29 

251-5 

28-93 

69-3 

— 


To test the significance of the regression coefficient ‘b' in the 
error line, we compute the statistic — 


•^(X’23) — ' 


(—46 6) 2 /165-2 


15i6— (— 46 6) 2 / 165*2 


X23 


13-1450x23 _302 335 

15-16— 13 1450 2U15 


= 150 0421 


• • F(H23)~ 150, and F ( 1 , 23 ) — 4*28 
•05 


Thus we have, cal. F > tab. Fat 5% leading to the conclusion 
that the quantity of food, in take has a significant effect on the gains 
in weights of the rabbits and the — ve value of regression coefficient 

( 0*2821 ^indicates the — ve correlation between x and 

y on the average. 


To test the effect of diets, we compute the corrected (adjust, d) 
S' S. for y (gains in weight) wi*h the help of the following table - 


Source of 
Voriation 

Unadj. 

S.S, 

Adjusting factor 

Adj. S. .9,,= unadj. 

S. S u — adj. factor 

Treat. 

13-77 


By substration 




7-8196 

Error 

1516 

' (-46-6)2/165-2=13-1450 

2-015 

Totals I 
(T+E) 1 

28-93 

(-69-3)2/251-5= 19-0954 j 

9-8346 
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A. V. T. for adjusted S. S v : 


Source of 



M. S. S y 

F. cal. 

F at 

Variation 

I 5% 

| 1% 

Treat. 

5 

7-8196 

1-56392 

! 17-85** 

2-64 

3-94 

Error 

23 

2015 

0-0876 
— V 

K E 


— 

— 

Totals 

28 

9-8346 

— 

' 


— 


The significant value of F corresponding to the treatments (diets) 
indicates that tl.e diets are not ho mogeneous but differ significantly. 

Finally to compare the diets, we compute the adjusted mean 
gains in weight of robbits due to diets A, B, C, D, E and F and also 
their C. D with the help of the following table — 


= 9 ' 5 - 4 =-iir“- 0 - 282 ' 


Diet 

*,= T,(x) 

r. 

■ 

b{X t -x) 

V 

Adjusted mean jp,' = 

y% r 

• 1 

A 

10 6 

1-1 

-0 3103 

31 

• 3 4103 

B 

70 

-2-5 

07053 

4-3 

3-5947 

C 

10-2 

07 

-0 1975 

27 

2-8 75 

D 

7-4 

-21 

0 5924 

3 7 

31076 

E 

11 6 

21 

-0 5924 

2-7 

3-2924 

F 

10-2 

0-7 

0 1975 

43 

4-4975 


(i) S. E . of difference between the two adjusted means of F 
and C. 

= / y ' r 1 i ( *6 ~ ^ 3 ) 2 1 1 ] 
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where r t =r 3 = 5 
V' E =0-0876 


(adjusted error variance) A = 1 65-2 


=y [4 + 51F +4] 

=V0 , 0876 x 0-4=v'0-03504=0-11872 
( C . D.)=(S. E. of difference) x t im =0 872) X2&9 =0-3873 
5% -05 

But y t '—y 3 '= 4-4975— 2-8975= 1-60 

As the difference between the adjusted average values of F and 
C is greater than their (C. £>), the diet F differs significantly from C 

•05 

at 5% level of significance. 

(ii) S. E. of difference between the two adjusted means of F 
and A. 


:.se . 


I V'E — 

* L /* 6 A 



= y 0-0876 [-L + — + g] -V0 T 0876x0-40r 

=V0V35im -01874 

(C. D.)—(S. E. of diffcrence)x t(23)=0 1874x2 069 
5% -05 

=0-3877 

But yy 4-4975-3-4103- 1-0872 

Evidently, the difference between the adjusted average values of 
Fand A is greater than their (C. D.) 

5 % 

i. e. 1*0872 > 0*3877, so the diet F differs significantly from A 
at 5% level. 

Similarly, the remaining pairs may be compared after 
computing the C. D. for each pair. 

Inference — The six diets produced the different gains in the 
weight of rabbits and the diet F is the best of all. 

Note — The calculations in the above example may be easier if 
the deviations be taken from y=3 0 and x A 11 for the two variates 
respectively. 
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Exp. No. (2) A va/ietal on cotton was laid out in 5 blocks 
and 5*varieties of cotton namely A , 5, C, D and E were tested, on 
each plot the total plant V°pulation (denoted by x) and cotton yield 
(de noted by y in' Kgms.) were recorded. The following table gives 
the data— 


Blocks 

\ 

\ 

Va r iety\ 

1 

2 

3 

1 

1 4 

5 

f X 

y 

X 

y 

X 

V 

X 

y 

l X 

y 

A 

27 

60 

21 

63 i 

27 

60 

21 

66 

27 

57 

B 

33 

70 

15 

49 1 

27 

66 

23 

66 

24 

59 

C 

28 

64 

25 

64 

33 

64 

25 

61 

25 

56 

D 

24 

65 

22 

64 

1 

21 

67 

19 

65 

26 

65 

E ■ 

32 

65 

1 24 

62 

28 

47 

18 

61 

35 

61 


Test whether the yields are materially affected by the plant 
population/plot and the five varieties differ significantly ? 


Solution— 

Ho : The data is homogeneous with respect to the blocks and 
the varieties. 


For convenience in calculations, we prepare the following table 
after taking deviations in x from x — 25 and m y from >>=60 Kgms. 
respectively. 


Blocks 

\ 

\ 

Variety \ 

' 1 l 

X y 

1 | 

2 

X } 

3 

* y 

4 

X y 

J 

• 

! 

! 5 

| X y 

Totals 

T (v) 

A 

2 

0 

-4 3 

2 

0 

-4 6 

i 

2 ~ 3 : 

-2. 6 

B 

8 

10 

-10-11 

2 

6 

-2 6 

-i — ii 

i 

-3, 10 

C 

3 

4 

0 4 

8 

4 

0 4 

0 

11, 12 

D 

-1 

5 

-3 4 

I 

-4 

7 

-6 5 

1 5 

— 13, 26 

E 

7 

1 

I" 1 2 i 

f 3 

13 

-7 1 

10 1 

12, -4 

Totals 

-®(»> 

19 24 

-18 2 

11 

4! 

-19 2~ 

12 -2 

5, 50 


82 


Experimental Designs 


S. S. for x : 

pi P ( 5) I. 2 J 

aF * = r=- 25 ~ l 

55 

T. S. S,=S2 x 2 ij—C. F J =(2) 2 +(8) 2 +...+(10) 2 — 1=561 — 1 

ij 

=560 

S. S x (blocks) -G. F x 


_(1 9) 2 +(— 18) 2 +(1 1) 2 (— 19) 2 +(12) a 
5 

= 262-2-1=261-2 


-1 


S. S x (treat, or variety) — ^ — C. F x 

(-2) 2 +(-3) 2 +... + (12) 2 
5 

-1=89-4-1=88-4 

S. S x (error) = T. S. S x due -o (blocks + treat.) 

=560- (26 1-2+88 4) = 560-?49-6=210-4 

S. S for y : 

I. S. S y —'Z^y 2 „—C. 7\„=(10) 2 +... + (l) 2 — 100=768 — 100=668 

S. P xy (blocks) =5 F v =' 2 - 4)2+( 2) 2 j~- - - +f — 4) a 

-100=216-8-100=116-8 

S. S v i treat.) = ^’-C. F, 

- 100=194-4-100 = 94-4 

S. S v (error)=r. S. S u —S. S v due to (blocks+ treat.) 

= 668 - ( 1 1 6-8 + 94-4) = 668 - 21 1-2 =456-8 

S. P. for x and y : 

pi p G( X ) ( 7(i .5x50 

IT 25 10 

T. S. P w ==ZLx«y«-C. F av =(2 x0)+(8 x 10)+...+(l0x 1)— 10 

ij 

= 120 - 10=110 
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S. P xv (blccks)= S -^> C.F XV - ^ 9 x24)+-+(12x-2) 

i 5 xv 5 

* —10 
=4*4 — 10= — 56 

S. P xv (treat .) = ^ T—-±-T^—C.F xy _ (— 2x6) + — +(1 2x-4) 

* ^ 5 

-10 

= -59*2- 10= -69*2 

S. P n (error)— r. S. P xv —S. P xy due to ( blocks + treat.) 

= 110— (— 5*6-69-2) 
= 1848 

Now we prepare the following table for unadjusted S. S. and 
S. P. for treamcnts and error only — 


Source of 
Variation 

D. F 

' 


5. P xy 

Reg. Coefficient 
i b' 

Treatments 

4 

88*4 

94-4 | 

| — 69*2 

— 

Error 

16 

210-4 

456-4 | 

184-8 | 

2^4-=° 8783 

Totals 

(Treat. + error) 

20 

298 8 

551-2 

115-6 

- 


To test the significance of the regression coefficient ‘ b ’ in the 
error line, we compute the statistic-- 


(184'8)*"M0-4 

fa, is )— 456 8— (184 8) 2 /2l'0-4 X 15 

_ 1620148 _2434-7220 

~ 294 4852 X 15 _ 294-4852 


8-2677 


/. F { i, 15) =8*2677, and F a , 16) =4-54 

05 


The +ve value of regression coefficient ‘ b ' indicates that there 
is +ve correlation between x (plant population per plot) and y (yield 
of cotton) and the significant value of F corresponding to regression 
coefficient shows that the plant population/plot has a significant 
effect on the vield of cotton. 
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. To test the effect of treatments (varieties), we compute the 
adjusted S. S u with the help of the following table — 


Source of 
Variation 

Knadj. 
S. S v 

Adjusting factor 

Adj. 5'5„=Unadj. S S v 
— adj. factor 

Treat. 

94 4 


By substruction 211 9914 

Error ^ 

456 8 

(184 8)7210 4 
__ =162-3148 

4568— 162-3148=294-4852 

' 

Totals= 

(T-tE) 

551 2 

(1156)7298 8 

44 7234 

551-2-44 7234=506*4766 


A. V. T. for adjusted S. S.„— 


Source of 
variation 


5. S „ 

M. S. S y 

F cal 

F at 





1 5% 

1 1% 

Treatments 

4 

211 9914 

52 99785 

2 6995 


4 89 

Error 

15 

294-4852 

19-6 i 23 • 

~ V E 




Totals 

19 

- | 06’4766j 


— 

— 

— 


The calculated value of F corresponding to the treatments 
(variaties) comes out to be insignificant and so the variaties 
A, B, C, D and E are homogeneous. 
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Now we need to compute the adusted mean yields (y/^only 
which are shown in the following table— 

_ xs=.'2g-+25=25 , 2, 6=0"8783, 


Variety 

£ — Til*) 

1 r 

(*— a) 

b(x r -V 

r 

5V=5h- 

b(Xi— x) 

A 

24-6 

-0-6 

-0-5270 

61*2 

61*7170 

B 

24-4 

-0 8 

0-7026 

62-0 

62-7026 

C 

111 

2-0 

1-7566 

62-4 

60-6434 

D 

22-4 

-2-8 

-2-4592 

65-2 

67-6592 

E 

21 -A 

2-2 

1-9323 

59-2 

57-2677 


Performance of yields — 

Variety : D B A C E 

adj (in e kgms e ) d 67 ' 6592 > 62 7026, 61-7270, 60 6534, 57’2677 

The variety D is the best of all since it gives the maximum adj. 
mean yield of cotton. 

Inference : The varieties of cotton do qot differ significantly 
and the variety D is the best of all. 





$6 Experimental Designs 

'Exp No. (3) An experiment on cotton was carried out in a 
5X5 L. S. to test the homogeneity of five varieties of cotton, namely 
A, B, C, D and E. On each plot the yield ( y kgras) and the total 


number of plants were recorded as follows— 



A 

B 

C 

D 

E 

X 

27 

21 

27 

21 

27 

y 

60 

63 

60 

66 

57 


B 

C 

D 

E 

A 

X 

33 

15 

27 

23 

24 

y 

70 

49 

66 

66 

59 


C 

D 

E 

A 

B 

X 

28 

25 

33 

25 

25 

y 

64 

64 

64 

64 

56 


D 

E 

A 

B 

C 

X 

24 

22 

21 

19 

26 

y 

65 

67 

67 

65 

65 


E 

A 

B 

C 

D 

X 

32 

24 

28 

18 

35 

y 

65 

62 

47 

61 

61 


Test whether the yield (y) is materially affected by the total 
number of plants per plot and the varieties of cotton differ 
significantly ? 



Analysis of Covariance 


87 


Solution : — 


. Ho : The data is homogeneous. 

Taking the 3eviations from x=25 and y=60, we pqepare the 
following table— 



S. S. for x. 

C F —~ <x l i 

N ~25 - 1 

55 

T. S. S >(C =ssA— C. F*=(2) 2 (2)+ ...+(10) a H — 1=561 

i j 

—1=560 


S. S. x (Rows)=S'~ ^ — C. F. 


(— 2) 2 +(— 3) 2 +...+fl2) 2 
5 


- 1=89 4— 1=88*4 
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S. S. x (columns)=2^-^ —C.F. X 

j 5 

= (i^+(--lg+-+(»j)l- 1= 262-2-l=261-2 

S. S. x (treat, or variety) =2^~i ( - — C. F. x 


= (^4)M^^-11) 2 +(7)^(12)^ _ 1 == 66 . 2 _i =65 . 2 

S.'^S. X (error)=J. S. S. x —S. S. x due to (Ro ws + coins + treats.) 
=560— (88-4+261 -2+65 2) = 145*2 

S. S. for y. 

G» M (50)* 

C ' F ' v ~ A -=- 25 _ = 10 ° 

T. S. S. tf =SS> <2 „-C. F. v =(0)*+(10)2+.. +(1)2-100 


J J 

D2 

5. (Rows) =2 — C. /■.„ 

i 3 


=768—100=668 


= ( 6 ) 2 +( 10 )®+ .-+(— 4 )® 100 = f94 4— 100 = 94 4 
(columns) =2^i~—C. F. y 

j 

^ (2 4 ) a +( 2 ) 2 + : .-+(-2)* _ioQ^2i6-8- 1Q0= 116-8 


S. S. v (treat.) =2 


T 2 

— ZT 

• 5 ■ 


(1 2) g +(l ) a +( — l) 2 +(22) g + (1 6) g 100 

= 177-2-100=77-2 

S. S. y (error) = T. S. S.y~S.<S. v due to (Rows+colos. + treats.) 
= 668 — (94*4+ 1 16*8 +72*2)= 379*6 

S. P. for x & y, 


/” I 7 G( X )P(y)_5 X 25 |Q 

c - 5 25 

T. S. P. av =22W«-C. F.„ 

ij =(2x0)+(8xl0)+..+(10xl)-10 
= 120-10=110 
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S. P. xv (Rows)^-v /? »w /? ’i' )_ r Fov 

' , i 

■ * 2y6) i-...-f(|2y— 4) 

5 ' 10 

- -59-2—10- - 69-2 

S. /%„ (columns)- v C ‘- , ~ ,w - C . F,„ 

j 5 

(1 9 >; 24) Y -. . -|-( 1 2 < - 2) _ 

— 4*4— 10— --5'6 

5. (treats )— v __ c. 

i 3 

. (— 4x 1 2) + ( 1 '<l)-K-|r;-|) + ' 7 v^)-|-(12 .16) 10 

=^62-0- 10-52*0 

5. P.. 1V (error) — T. S P. xv - S. P due to (Rows+cols.-|- treats.) 
= 1 10— C— 69 2— 5*6 1-52*0) 1 32*8 

Now we prepare the following table for unadjusted 
S. S . and S. P. for treament and error only — 


Source of 
variation 

D. F. 


5. s. v 

S P. 1V 

Reg. coefft. 

•v 

Treat. 

4 

652 

77-2 

52-0 

— 

Error 

12 

145-2 

379-6 

132-8 

!45 2“ (MI46 

Totals (T+E) 

i 

16 

210-4 

! 

456-8 

184-8 
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To test the significance of the regression-coefficient ‘b’ in the 
error line, we compute the statistic — 


F(l, 11) 


032-8)*/ 145-2 
379-6— (132-8)V145 , 2 


Xll 


121-4590 

“258-1410 


1336-0490 


258-1410 


= 51757 


F(l, ll)=5-1757 andF 05 (l,ll)=4-84 


f 


The +ve value of regression coefficient */>’ indicates that there 
is +ve correlation between x (no. of plants/plot) and y (yield of 
cotton) and the significant value of F corresponding to the regression 
coefficient shows that the plant population/plot has a significant 
effect on the yield of cotton. 

To test the effect of treaments (varieties), we compute the 
adjusted S. S. v with the help of the following table— 


Source of Unadj. 
variation S. S y 

Adjusting factor 

[Adj. S.Sy— Unadj. S.S V — adj. 
i factor 

’ 

Treat. 

65-2 

— 

By Subtraction 36‘3442 

Error 

145-2 

‘ li ?- 121 ' 4590 

379-6- 121 -4590=258-1410 


Totals 

(T+E) 


210-4 


(184"8) 2 /210*4 
= 162-3148 


456-8-162-3148=294-4852 


A. V. T. for adjusted S. S„ — 


Source of ~ p 
variation 


S'. S v 


F at 

M. S. S v F cal. 5% | 1% 


Treat. 

4 

36-3442 

9-08605 

2-58 

5-93 

14-45 

Error 

U 

•258-1410 

23 4674 


— 

— 

Totals 

15 

294-4852 


— 


— 
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Calculated F corresponding to the treatments (varieties) comes 
out to be insignificant and so the varieties A, B, C, D and £ are 
homogeneous. 

Nows we need to compute the adjusted mean yields (y' t ) only 
which are shown in the following table — 

*=25-2, 6-0-9146 


Variety 

x,- Ti(x) 

(#<-*) 

60c,— 55) 

y i== Tihl 

r 

y i -b(x l -x)=y i ' 

*i 

r 

A 

24*2 

-1*0 

— 0*9146 

■ 

62*4 

63*3146 

B 

25*2 

0 

0 

60*2 

60*2 

C 

22*8 

-24 

-2*1950 

59*8 

61*9950 

D 

26*4 

1*2 

1 0975 

64*4 

63*3035 

E 

27*4 


2*0121 

63*2 

61*1879 


Performance of Yields— 

Variety : A D C E B 

Adj. mean yield : 63 - 3 1 46, 63*3025, 61*9950, 61*1879, 60*2 

(in Kgms.) 

Inference— The varieties do not differ significantly and the 
variety A is the best of all. 
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Exercise V 

Q. 1. What is cocomitant variate and how it can be used for 
improving the precision of experimental results ? Explain with 
examples and indicate the method of Statistical analysis of results ? 

(M. Sc. Ag. Agra, 1956) 

Q. 2. Discuss the uses of analysis of covariance in 
experimentation ? , 

In a randomized block experiment with V replications, the 
decrees of freedom, sum of squares and sum of products for 
treatments and error are denoted as follows— 

D. F. Six' 1 ) S{xy) Sly 1 ) 

Treatments p A x B x C x 

Error : q A., B, C, 

Outline briefly the procedure for carrying out the adjusted 
analysis of variance for y ? 

(M. Sc. Ag. Agra, 1962) 

Q 3. In a R. B D. with 4 replications and 16 treatments the 
following results were recorded -- 

Source : D. F S. S v (unadj.) S. S„ (unadj.) S. P„ (unadj.) 

Treatment: — I0777'5 13’5 112*5 

Error: - 15698 2 36*0 248 0 

Test whether the 'yield (>’j is materially affected by the plant- 
population (r) per plot and the six treatments differ significantly ? 

Ans. ^( 1 * 14 ) “5 37, 3 43 



CHAPTER VI 


Analysis of Incomplete Observations. 

Or 

MISSING PLOT TECHNIQUE 


In field experimentation, whatever care the experimeter may take 
in designing and conducting the experiment, the yield* of some plots 
may not be obtained correctly. They may be depredated by cattles, 
wild animals and birds or affected seriously by some pest, disease or 
flood etc. Sometimes, it may lx the case that the yields of some 
plots may be affected by some f. ctor which does not affect the others. 
Such plots affected from extranious sources would not provide 
unbiased comparisons and hence their yield have to be omitted from 
the analysis of the data. In the case either the yields are missing or 
have been omitted, the data is incomplete and its statistical analysis 
is somewhat more complex than that of the complete data. Here, 
we shall deal with the case with one missing observation only for 
R. B. D. and L. S. D. 

(i) Analysis of R. B. D. with one missing unit : 

Suppose, we have got V treatments each replicated V times in 
the original plan and the yild ol u .„Lh plot be missing. The yields 
can be arranged in the following tabular form — 


\ Blocks 
\ 

Treat \ 

1 > 1 
i 

1 

2 

3 

j j 

r 

Totals 

i 

Tii 

T 12 

Tin 

T 11 , 

yir 

n 

2 

T* i 

T 22 

T 2.1 

y*, 

Ta r 

T t 

3 

Tai 

T 32 

T33 

y» 

Tar 

Tx 

i 

T<i 

T<a 

yiz 

X 

y xt 

T 

v 

T.i 

T«2 

y* 3 

T*< 

Ttir 

T v 

Totals 

1 *i 1 

B a 

1 ^3 

1 B 

1 B r | 

G 
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Where X denotes the missing value, T is the total of <th 
treatment and B is the total of ,th block (replicate) containing the 
missing value. 

G is the grand total of (vr— 1) values. 

There are two methods of analysing the above data — 

(a) Bartlett’s Method — Bartlett first made the use of analysis 
of covariance-technique in analysing the incomplete data. For a 
single missing value, the method consists in giving the value zero to 
the missing yield and then supposing an imaginary variate V which ' 
assumes the value zero for all the plots except the missing plot nhere 
it takes the value — 1 and finally applying yield as dependent variate. 


Standard Error (S. E.) — If the treatments come out to be 
significant at a % level of significance, then the S. E of the difference 
between the two treatments, none of which has the missing value, is 

I 

tL, where V’ ^ is the adjusted error-variance and that of 

between the two treatment means, one of which has the missing 
value, is 7~ [ 2+ (/•— I)(v— 1) ] 


Mean of the treatment (for which the value is missing) 

The mean of the missing value treatment is 

— T T - f 7— tr-r- — — !l and the rest procedure is the same as in the 
r L (r-l)(v-.)J K . 

case of a complete data. 

Exp. No. (1) In the following table, we are given the number 
of pods/plant. Three plants are selected at random from each of 
the four blocks but the yield of the plant of block number 3 with 
variety v 2 could not be recorded. Carry out the analysis of the 
following incomplete data and state your conclusions ? 


\ Variety 


\ 

Blocks \ 

?! 

h 

v* 

Totals 

B \ 

15 

17 

15 

47 

b 2 

. 12 

19 

18 

49 

B 3 

10 

X 

17 

27=5 

B, 

15 

20 

16 

51 

Totals 

52 

56, 

=r 

66 

174= t? 
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Solution— 

• Ho : The data is homogeneous with respect to the varieties 
and the blocks. , * 

The Statistical analysis will be done in the following tabular 
form— 


\ Variety 

\ 

Blocks \ 

h 


^3 

Totals 

Bx 

15 (225) 

0 

17 (289) 

0 

15 (225’ 

0 

47 (739) 

0 

B> 

12(144) 

0 

19 (361) 

0 

18 (324) 

0 

49 (829) 

0 

B 3 

10 (100) 

0 

! 0(0) 

-1 

17 (289) 

0 

27=5 (389) 
-1 

B 4 

15 (225) 

0 

20 (400) 

0 

16 (256) 

0 

51 (881) 

0 

Totals 

52 (694) 

0 

, 56 =T 

1 -1 (1050) 

66 (1094) 

0 

174=G 
-1 (2838) 


Now we compute C. F & as follows — 


C. 



(1V_ 

12 


0-083 


r f ( zy y (174)2 30276 

* N 12 12 

=2523-0 


r p JLx.'Ly 
C. t jry— ^ 


( 1)( 174 ) 

12 ” 


The figures within brackets indicate thes quares of the yields *y & ' 
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S. S. for x 


3 4 

T. S. S. =m 2 «-C. F,= 1-0-083 *0-917 

‘j 

S. S t due to blocks == j ~ c - F * 

(0) 2 +(0) 2 +( — l) a +(Q) 2 _0-083 

- 3 , 

=-0-333- 0 083 — 0-250 

y T 2 - 

S. S x due to varieties = C. F x 

(0l 3 + ( -1) 2 4(0 ) 2 _ 0 . 083 
4 

=0-250-0-083=0167 

5. 5 X due to error = T. S. S x —S. S x due to (blocks 

4 varieties) 

=0-917— (0-2504-0167) 
-=0-917-0-417=0-500 


S. S. for y : 

T. S. S,=S£y 2 „-C. / ? ,= (15)*4(17)H-...4(16) 2 




S. S„ (blocks) 


2838—2523 - 315 0 

lB 2 i r p (47) a +(49> a +(27) 3 +(51) a 
- j 3 L - 1 v ■; 3 


-2523-0 
2209 4 2401 4729 42601 
3 

-2523-0 

= 79 f-2523-0 * 

= 2646-67-2523-0 
= 123-67 

S. S, (varieties) J^-™ 2 -2523^ 

^ 2704+31 36+4356 2523*0 

4 


= ruivo — 2523*0=2549 — 2523 
4 

=26*00 
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S. S v (error) =7. S. S„ — S. S„ due to (blocks-)- varieties) 
=315-00— (123-67+26.00)= 165-33 

S. P. (Sum of pfocucts) for x, y : 

T. S. P. X y='L'Zx l ,y ii -C. E. xv =Q —( — 14*5)— 14*5 

• j 



Source of 
variation 

D F. 

5 . s. r 

•J- S.y 

1 S- P -OCV 

Blocks 

3 

0-250 

123-67 

5-5 

^Varieties 

2 

0-167 

26-00 

0-5 

Error 

6 

0-500 

165-33 

8-5 

Totals 

1 

11 

0-917 

31500 

1 14-5 








The adjusting factors and adjusted S. S. for are computed in the following table for treatment? 
(varieties) and error only — 
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(i) S. E. of the difference between the two variety-means (except of V 2 ) 

= J—Ji= / 2x 4~T6~ = y'rTix— l- 142? 
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(ii) S. E of the difference between the two variety means 
(Vi t\nd any other) 


-m 


2-f 


(r-l)(v-l) 


il- 6 [ 

2 + -U 

4 L 

3 x l\ 


(where one variety has one missing value) 

(i) (C D. =(S. E. of difference) x t(5) 

5% *05 

= 1-4422x2-571 =3-7079=3-71 

(ii) (C. D.) =(S. E. of difference) x t(5> 

5% -05 

= 1-6124x2-571 = 4-1455=4 15 
The variety-means are calculated as follows— 

mean of ^,= — = 13-0 
4 

mean of V a =~ = l&5 
mean of K,= |[ 

(since v 2 has one missing value) 

-4 

=$■ [56+ 17]= 18-25 

Now we arrange the variety-means in descending order of their 
magnitudes— 

Variety : V 2 V 3 V x 

Mean : 18-25 16-60 13-00 


, (4x27+3x56-174)1 
|56 ' h 3x"2 J 


Inference : The varieties V lt V 2 & V 3 differ significantly in 
yieliing-capacity it 5% level. We also note that V 2 has the maximum 
yielding-capacity followed by V 3 but their difference is not significant. 
The least yield has been observed in the case of variety V x which 
does not differ significantly from V 3 but differs significantly 
from V a . 
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(b) Substitution-Method — The method of substitution consists 
of the following steps— 

(i) Estimate the missing value by using the formula— 
y {rB+\T— G) 

(r-l)(v-l) 

(ii) Substitute this value of X for the missing plot yield and 
compute the S. S. as if no value is missing. 

(B+\T— G) 2 t 

(iii) Subtract the quantity - ~ 2 from the sum of 

squares due to treatments to get the adjusted sum of squares for the 
treatments. 

(iv) Reduce the d. f. for total and error by one, as one missing 
value is estimated from the data. 


(i) Prepare the analysis of variance table and test the 
significance of the treatment-effect. 


(vt) Calculate the S. E. of the difference between the two 
treatment means, none of which contains the missing value by the 
relation— 


/ 2V 

S. E. of difference = J §_ and that of the difference 

between the two treatment means, one of which contains the missing 
value, by the formula— 

S. E. of difference= J y^. ^ 1 ■■ - j , if the 

treatments show oignificant effects. 


Exp. No. (2,— Estimate the missing value in example no. (1) 
and test the significance of the difference between the means of the 
varieties ? 

Solution : 

Ho : The data is homogeneous with respect to varieties and 
blocks. 

First of all, we estimate the missing value in the data by the 
formula— 

_ (rB+vT-Gj 
“ (r- l)(v— 1) 

^(4 x 27+3 x 56—1 74) __102 _ .&. 0 
3x2 6 “ 
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Substituting this estimated value for tbe missing yield, we take 
the deviations from y — 16 for our convenience and prepare the 
followwing table—* 


Variety 

\ 

\ 

Block \ 

Vj 

V 2 

I 

v 3 

Totals 
= B 

| 

|Totalsj 2 

Bl 

-1 

-1 

— 1 


1 

(') 

(') 

(') 

(3) 


b 2 

—4 

Bl 

2 

1 

1 

( 6) 


(*) 

(29) 


B 3 , 

wm 

i 

1 

—4 

^ 16 

1 

(i) 

(1) 

(38) 


B 4 

—i 

4 

0 

3 

9 

0) 

(16) 

(0) 

(17) 

0 


Totals 

— 12 

9 

2 

— 1 - G 

1 

= r * 

(54) 

(27) 

(6) 

(87) 


^Totals J 

144 

81 

4 

1 


Variety 

1 

Vi 

Vt 

v > 1 



Mean 1 

L 

13 

1825 

16-5 
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C.F. 


& _(—!)* 
N 12 


=0083 


34 : , 

T. S. S.=-zxy*ii-C. F.f=87-0*083=86*917 
i j 

S. S. (blobks) =? y-^-C. F. = t±l+lg±i -0 083 

=9-0*083=8*917 * 

S. S. (varieties) =? ^~—C. F. 

__ 144+81+4 _ 0>0 g 3 _ 57-250 _ 0 . 0 g3 
4 

=57*167 

S. S. (error)= T. S. S.—S. S. due to (blocks + varieties) 
=86*917— (8*917+57* 167)=86*917-66*084 
=20*833 

The adjusting factor for variety S. S. 

or [ fl-(v-Dxp 
v(v-l) 

( _[27 — 2 xYTJ- _49 

. 3X2 6 

= 8*167 


S. S. (varieties) adjusted =5. S. (varieties) unadj.— adj. factor 
=57*167-8*167=4900 


Now we arrive at the following A. V. T . — 


Source of 
variation 

D.F. 

S. S. 

M. S. S. 

F. cal. 

F tab. at 

i 


I 


1 5% 

1 1% 

Blocks 

3 

8*917 

2*9723 

4*1666 
2*9723 
= 1*39 

9*01 

28*24 

Varieties 

(adj.) 

2 

4900 

24*5 

24*5 

4*1666 

=5*87* 

5*79 

13*27 

Error 

5 

20*833 

4*1666 

V E 

— 

— 

— 
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The calculated value of F corresponding to the varieties comes 

oat to be significant at 5% level and it necessitates the further 

computations of 1 S. £ and C. Z) as given below— 
s s 

(i) S. E of the difference between the two variety- means 
(except ofV 3 ) 

l~ WZ /2x4 : 1666‘ 

V 4 =V2 0833 

(where no variety has missing unit) 

(ii) S. E. of difference between tbe two variety-means 
(V 2 and any other) 

= V 4T666 X 0-6T90 = V 2 : 579l25T =1-6059 

(where one variety contains one missing unit) 

(i) (C. D.)=(S. E. of difference) X t (5) 

5% -05 

= 1-4433 x2*571 =3-71 1 85=371 

(ii) (C. D.)=(S. E. of difference) x t (5) 

5% -05 

= 1 -6059 X 2-571 -4-1288=4-1 3 

Now we arrange the varieties according to their performance— 

Variety : V 2 V 3 V x 
Mean : 18-25 16-50 13 00 


Inference— Here we draw the same conclusions as given in 
exp. (1) on page (99). 
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(i) Analysis of Latin Square Design with one missing unit : 
Suppose we have got a ‘K x K’ L. S. D. with (<, ,th) plot 
yield missing and 

R is the total o r the row containing the missing value, 

C is the total of the columns containing the missing value, 

T is the treatment total containing the missing value and 
G is the grand total of ( K 2 — 1) observations. 

t 

The analysis of this incomplete design can be carried out by 
using the analysis of ‘covariance- techn ique’, but the 'substitution- 
method’ is simpler and more rapid than the former. The steps 
involved in the ‘substitution- method’ are — 


(i) Estimate the missing value by the formula 

Y _m±T±C)-2G 

(ii) Substitute this value of X for the missing observation and 
calculate the S. S. in the usual way, 

(iii) Subtract the quaotity — from the 

treatment S. S., 


(iv) Reduce the total and error d. f. by one since one missing 
value is estimated in the data. 

(v) Prepare the- analysis of variance table to test the 
homogeneity of the data. 


(vi) Compute the S. E. of the difference between the two 
treatment means, none of which is attached with the missing value 
by using the relation — 

S. E of the difference = ^J^E^ , and that of the difference 


one of which is attached with the missing value, by the formula — 


S. E. of the difference = 




if the 


treatments show significant effect. 
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Exp. No. (3) : Estimate the missing value in the following’ 
L. 8. D. and carry oit| the analysis of variance to test the significance 
of difference between the treatment* means ? 


B 

249 


A 

254 


D 

245 


C 

251 


D 

245 


C 

249 


B 

254 


A 

261 


A 

249 


D 

240 


C 

250 


B 


C 

244 


B 

252 


A 

257 


D 

246 


Solution : 

Ho : The data is homogeneous. 


The missing value is estimated by the formula — 

^K(R+T+C)-2G 
X (K-l)(K-2) , 


where K=4, /?=758, C=739, T = 755 and (7-3746 from the 

data. 


4(758 +755 + 739 ) - 2 x 3746 _ 9008 - 7492 
3x2 " * 6 


1516 

6 


-252 66 


252-7 
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Using the estimated value 252 7 for the missing unit and taking 
the deviations from >>=250, we prepare the following table to 
compute the sum of squares — 


\Col.n 

\ 

Row \ 

1 

2 

3 

4 

f 

Totals 
= R 

(Totals 
= R) 8 

1 

B 

—1 

(1) 

D 

—5 

(25) 

A 

—1 

0) 

C 

—6 

(36) 

—13 

(63) 

169 

2 

A 

4 

(16) 

C 

—1 

(1) 

D 

—10 

(100) 

B 

2 

(4) 

—5 

(121) 

25 

3 

D ' 

-5 

(25) 

B 

4 

(16) 

c 

0 

(0) 

A 

7 

(49) 

6 

(90) 

i 36 

4 

C 

1 

(1) 

' 

A 

11 

021) 

B 

2*7 

(7*29) 

D 

•—4 

(161 

10*7 

(145*29) 

114*49 

Totals— C 

—1 

(43) 

9 

(163) 

O-l 

|(l 08*29) 

— 1 
(105) 

-1-3=G 

(419*29 

) 1 69 

(Totals=C) 2 

1 

81 

68*89 

1 

1*69 


Treat. 

A 

1 21 
| (441) 

1 B 

1. 7*7 
| (59*29) 

C 

-6 

(36) 

D 

—24 

(576) 



Mean 

2 55*25 

251*925 

| 248*5 

1 

1 

| 256*0 
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C. F. = G ~ = 0 * 105625 j =;0 1056 

T.S. S.=22y a «— C. F .= 419 - 29 - 0 - 1056=419 1844 

ij 

S.S. columns) =f %-C. F. J+*±±6*™±L _ 0 .1056 


15189 


-01056 


= 37 - 9725 - 0 - 1056 = 37'8669 


S. S. (Row)=f *‘-C. F. = J®t?5_f_3«_+Jll« -01056 


= 3 — - 9 0 1056 = 86 * 1225 - 

4 

-01056 

= 86-0169 


T 7 *, u 7 " 2 /? , T 2 .J 2 

5. 5. (treat.) =.. Z 4 h £ ±,_ c ±. P -C. F. 

4 

44 1 ) 59 - 29+36 1-576 


-0 1056 


= 278-0725 - 0 - 1056 = 277 -9669 

S. S. (error) — T. S. S. — S. S. due to (rows+columns+treat.) 
= 419 - 1844— (86 0169 + 37-8669 + 277 - 9669 ) 
= 419 - 1844 - 401 - 8507= 17 3337 

r 1)7’+ R-\-C G ] 2 

The adjusting factor for treatment S. 5.= lv |)(^_2jj* — 

[3 X 755 + 758 + 739 — 3746] 2 ( 16) 2 256 

~ [3 X 2] 2 ~( 6) 2_ 36 

= 71111 

adjusted S. S. (treat.)=unadj. S. S. (treat.)— adj. factor 
=277-9669-7-11 1 1 =2708558 
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Now we arrive at the following A. V. T . — 


Source of 

/). F. 

s.s. 

M. S. 5, 

F. cal. 

Fat 

Variation 

" 5% | 

1% 

Rows 

3 

860169 

28-6723 

8-27* 

5 41 

1206 

Columns 

3 

378669 

12-6223 

3-64 

ry 

• 9 

Treat. (adj.) 

3 

270-8558 

90-2853 ! 

2605** 



Error 

5 

17-3337 

3-4665 
= V E 




Totals 

14 

— 

— 

— • 

- 

— 


Tee calculated value of F corresponding to rows comes out to 
be significant at 5% level and that of treatments is highly significant. 
The insignificant value of F corresponding to coulmns indicates that 
the design has no improvement over the R. B. D. in this example. In 
order to test the significance of the difference between the two 
treatment-means, we compute below the S. 2? s and C. Z) g *— 


(i) The S. E. of the difference between the two treatment-means 

(except of B) is given by the formula 

% 

= 1 =vT'73325 =1*3165 

(where no treatment has missing value) 
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The S. E. of the difference between the two treatment-means 
(B and any other) is given by the formula — 



= J 3-4665 ^ = V 3-4665 x 0-6333= y^- 19533445 

= 1-4816 

(where one treatment has one missing unit) 

(i) (C. D.)=(S. E. of difference) X t (5) 

•05 05 

= l-3165x2-571 = 3-384=;3-385 

and (C. D.) — (S' E. of difference) x t (5) 

01 01 

= J -3 165 x 4 032= 5-308 1^5*308 

(ii) (C. D.)^-(S. E. of difference) X / (5) 

05 -05 

= 1-4816x2 571 = 3-8092^3-809 

and (CD.) =(S. E. of difference) x t (5) 

01 01 

= 1 -4816 x 4-032=5-9738ss5-974 


Now we arrange the treatment-means in the. decreasing order 
of their magnitudes — 


(i) For, 5% level, 


Treatment : D A B C 
Mean: 256-0 255-25 251 -925 248’5 


(ii) For 1 % level, 


Treatment : D A B C 
Mean: 256'0 255-25 251925 248-5 


Inference— The maximum yield has been recorded in the case 
of treatment D and minimum by C. The difference between D and 
C is significant at both 5% and 1% levels but the difference between 
D and B is significant at 5% and not at 1%. 
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Exp. No. (4)— Compare the advantages and disadvantages of 
the R. B. and L. S. designs in field trials. Ip a trial of five varieties 
of wheat. A, B , C, D and E laid out in a L. S'., ‘the following yields 
(in 02 . per plot) were obtained— 


Totals 


B 

E 

C 

A 

D 

Totals 

90 

80 

134 

— 

'92 

396 

E 

D 

B 

C 

A 


85 

84 

70 

141 

82 

462 

C 

A 

D 

B 

E 


111 

90 

87 

84 

69 

441 

A 

C 

E 

D 

B 


81 

125 

85 

76 

72 

439 

D 

B 

A 

E 

C 


82 

60 

94 

85 

88 

409 

449 

439 

470 

386 

403 

2147 


The yield of plot under variety A in the first row was lost on 
account of damage.by cattle. Calculate the missing value and the 
adjustment to the treatment sum of squares in the analysis of variance 
of the completed data. St te the formula for the S. E or comparison 
of two varieties involving variety A ? 

• (M. Sc. Ag. Agra, 1958) 


Solution— 

For the first part of the question, see theory. 

Ho : The data is homogeneous. 

The missing value is estimated by the formula— 

, * K(R+C+T)-2G 

X ~ (AT— 1)(AT— 2) 

where K= 5, 11=396, C=386, T= 347 and <7=2147 from the 

data. 

. 5(396+3*86+347)— 2x2147 

*= 4x3 


5645 -4294 1351 

12 ~ 12 


112-5833;= 1126 



Substituting this eotimated value 112*6 for the missing .yield and taking the deviations from y= 90 
(for convenience in calculations), we get the data in the following tabular form— 
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N 

as 

II 

3 

H 

343396 

% 

144 

00 

131 

1681 

9216 
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C. ^^=^^=3-63611 
N 5 

T. S. S.=rz s y* {i -C. F. = 9463 • 76 — 3 - 6864 = 9460*0736 

i j 

t> 2 

S. S. (rows)=S~-^— C. F. 
i -* 


_ 3433 ' 96+j 44 + 8 1 + 121 + 16£1 _ 3 . 6g64 

= 1092 - 1920 - 3 - 6864 = 1088'5056 

rz, 

S.S. (columns)=S — C. F. 
j 5 

= 1 + 1 21 + 40 0 + 2 351 - 96+2209 _ 3 . 6g64 
= 1018 - 5920 — 3 . 6864 = 1014-9056 


t* a 

S. S. (varieties) =2 — — — C. F. 

92 * 16 + 5476+22201 + 841+2116 __ 3 . G864 
5 

= 6145 - 2320 — 3 - 6864=6141 -5456 


S. S. (error) =T. S. S.-S. S. due to (rows.+columns+ 

varieties) 

= 94600736 -( 1088 - 5056 + 1014 - 9056 + 6141 - 5456 ) 
= 9460 - 0736 - 8244-6568 = 1215-1168 

The adjusting factor for variety (treatment) S. S. is given by 
the formula— 


- , _ [(K-l)T±R+C-MP 
adjusting factor — ■ ^ lj(l^ 2)p 


[ 4x 347 + 396 + 386 - 2147] 2 
[4 X 3] 2 


( 23 )* 

-( 12) 2 


= 3-6736 


adjusted S. S. (varieties)=unadjusted S. S. (varieties) -adj. 

factor 

= 6141 * 5456 - 3-6736 
= 6137-8720 
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Now we arrive at the following A. V. T.— 


• 1 
Source of: 
variation | 

np 

P F. 

* *S. T. 

M. S. S. 

' F cal. 

F at 




1 5% 

] 1% 

Rows 

4 

1088-5056 

272-1264 

2-46 

3-36 

5-67 

Columns 

4 1 

1014-9056 

253-7264 

2-29 

9f 

99 

Varieties 

(adj.) 

4 

6137-8720 

1535-4680 

13-89** 

99 

99 

Error 

11 

12151168 

110-4652 

- V£ j 

i 

99 

— 

Totals 

- 

— 

— 

— 

— 

— 


The calculated value of F corresponding to the varieties comes 
out to be highly significant and those of rows and columns both are 
insignificant showing that the design has no improvement over 
C. R. D. in this example. 

The formula for the S. E. of the difference between the variety- 
means involving the variety A (which has a missing unit) is given 
by- 

S. E. of difference^ V E * 

where V~ is the error variance (error mean square) 

£> 

= /^/nO-4652 ( t+ 

= y/~l 104652 *TlT = V 53d 0826923 

=7-2875 

Result : The missing value=112-approx.. 

The varieties differ significantly at 1% level of significance. 
The S. E. of the difference between the two variety-means involving 

A, is given by ^ K E j + and 0011168 out as 7 ‘ 2875 - 
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EXERCISE VI 

Q. (1) : (a) What are the methods of estimating a missing 
value in R. B. D. ? 

(b) The table gives the results of a randomised block experiment 
in which the observation indicated by X is missing. Estimate the 
missing value and analyse the data ? 

Treat. Replication 



I 

II 

111 

IV 

V 

VI 

Totals 

A: 

17 

29 

25 

17 

33 

23 

144 

B : 

19 

23 

X 

15 

23 

19 

99 

C : 

33 

35 

29 

25 

37 

27 

186 

Totals 

69 

87 

54 

57 

93 

69 

429 


Ans. (b) X =19 2 . V £ = 7'438K ; (adj)= 174*90 

Q. (2) : The yields/plot (in kg ms) are recorded below on the 
basis of an experiment performed on lour varieties of barley each 
with 6 replications. But the yield for the variety A could not be 
recorded in th.- sixih plot as it was lost on account of damage by 
cattle. Estimate the missing yield and carryout the analysis 
of variance with adjusted sum of squares for variety and state your 
conclusions ? 


Variety 

1 

' 2 

Replications 
3 4 

5 

6 

Totals 

A : 

15 

17 

15 

17 

19 

a 

83 

B : 

21 

19 

15 

19 

17 

17 

108 

C : 

19 

17 

17 

21 

19 

17 

110 

D : 

21 

23 

19 

25 

22 

17 

127 


— 

— 

— 

— 

— 

— 

— 

Totals 

76 

76 

66 

82 

77 

51 

428 

Ans. a= 

= 14 

’ V £- 

2*6 1 905 F, 

(adj)= 

=23361 1 




Q. (3) A feeding experiment was conducted on a dozen of 
cows of four different breeds whioh were grouped into 3 such that 
each group consists of 4 cows of different breeds. The cows in 
3 groups were subjected to three types of diets (treatments) 
A, B and C and the increase in milk-yields (ounces/cow) after a 
week were recorded as follows— 


Diets/breeds 

I 

11 

III 

IV 

A s 

8-0 

26-0 

18-0 

210 

B : 

220 

25-0 

24-0 

3 6 '0 

C: 

110 

27-0 

130 

X 
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were x denotes the missing yield for the cow of 4 th breed 
which was supplied the diet C. 

(i) Estimate the missing yield and carryout the analysis of 
variance to test the significance of difference between the 4 breeds 
and three 3 diets ? 

(ii) State the formulae for comparison of two diets involving 
the diet C and excluding the diet C ? 

(iii) Write down the adjusting factor to get the adjusted 
treatment sum of squares ? 

Ans : (i) *=25, V £ =22-9667 K,(adj.)= 84-50 

for S. E. of difference of treatment- means, 
(iii) [5— (v— l)X] 2 /[v(v— 1)] 

Q. (4) . Estimate the missing value (denoted by X) in the 
following 4x4 L. S. D. and carryout the analysis of variance to test 
the homogeneity of the bata and also state the formulae for S. E. of 
difference between the two treat-means involving the treatment which 
has one missing value and excluding it ? 


c 

B 

A 

D 

25 

23 

20 

20 

A 

D 

C 

B. 

19 

19 

21 

18 

B 

A 

D 

C 

19 

X 

17 

20 

D 

C 

B 

A 

17 

20 

20 

15 


where letters stand for treatments. 

Ans. *=17 0, V £ =1‘0, 


V (adj.) = 10-1852, 

t 
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Q. 5. In a trial of five varieties of wheat A, B, C, D and E, 
laid out in a Latin Square, the following yields (in oz./plot) were 
obtained. But the yield for the variety A in the second row and 
5th column was lost on account of damage by animals. 

(i) Calculate the missing yield and the adjusting factor to the 

treatment (variety) S. S. in the analysis of variance of the completed 
data ? r 

(ii) Analyse the data and interpret your results obtained ? 


A 

B 

C 


D 

E 

Totals 

50 

70 

70 


80 

90 

360 

B 

C 

D 


E 

A 


70 

90 

80 


80 

— 

320 

C 

D 

E 


A 

B 


60 

50 

90 


80 

90 

370 

D 

E 

A 


B 

C 


50 

60 

80 


50 

70 

310 

E 

A 

fi 


C 

D 


80 

90 

50 


85 

60 

365 

Totals 10 

• 360 

370 


375 

310 

1725 


Ans. (i) 

X= 1 00, adj. factor 

= 765625 


(ii) 


216 3636 

V , (adj.)=270-8594 

Q. 6. We 

are giving Below 

some results 

of a 5x5 L. S. 

experiment conducted on 

25 cows of 5 different 

breeds and of 5 

lactation periods fed with 5 types 

of 

rations 

A, B, C, D and E, for 


a month. The data for increase in milk-yields (in gms.) were observed 
after the month but the increase in milk-yield for the cow of 4th 
breed and 1st lactation period supplied with diet A, could not be 
recorded. 

Row (lactation) total containing the missing-yield =396 gms. 

Columns (breed) „ „ „ „ =386 „ 

Total under diet (treatment) A =347 „ 

Grand sum=2147 „ 
T.S.S. =9460*0736 
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Source of 

Variation 

m 

D. F. 

1 

S. S 

M. S. S. 

F 

Rows (lactation 

periods) 

— 

— 

272-1264 

— 

Columns (breeds) 

— j 

— 

253-7264 

— 

Diets (Treats.) adj. 

— 

- 

— 


Error 

11 

1215-1168 

— 

— 

Totals 

23 





[a] Estimate the missing increase in milk-yield ? 

[b] Calculate the adjusting factor to treatment sum of squares 
and obtr.in the adjusted treatment S. S. ? 

[c] Also complete the A. V' T. and state your conclusions 
regarding the homogeneity of breeds, lactation periods and the 
different diets ? 

Ans. [a] X= \ 12 6 approx 

[b] adj. factor=3 - 6736, 

adj. T S. S.= 61 37-8720 

[c] P—2'i6 for rows (lactation periods) 

F= 2 29 for columns (breeds) 

F= 13-89 for treatnlents 








CHAPTER VII 


Factorial Experiments 


Concept — In the foregoing experiments' performed either in 
C. R. D., R. B. D. or L. S. D., we were concerned only with the 
variation in a simple factor like different varities and manures, 
different supply-rates of the same manure and cultural trertments etc. 
In field-experimentation, very often the situations arise when we have 
to test the variation in a no. of factors simultaneously. For example 
we may be interested in selecting the best variety of all the available 
ones of a certain crop and rate of nitrogen supply for a newly acqui- 
red land. In order to achieve the object, we may first compare the 
varieties in absence of nitrogen and adopting the best variety as shown 
by the experiment proceed to select the best rate of nitrogen supply 
by performing a second experiment. The conclusions drawn as above 
are valid only when the effect exerted by any one qf the two factors 
is independent of the other but it is not always true. In most of the 
cases, the response of the first factor varies according to the levels 
of the second factor ( i. e. the two factors interact each other). For 
example, a higher level of irrigation is essential to secure an adequate 
response of yield when a heavy doze of nitrogen has been applied to 
a certain crop. Thus the use of the above discussed scheme is 
limited. Another point against this approach is that the precision 
of the two experiments are different and hence the results cannot be 
compared. 

The most informative and efficient approach when several 
factors are under study, is to compare all the possible combinations 
of the different levels of all the factors simultaneously in the same 
experiment. This approach is known as the Factorial Concept of 
experimentation. As it compares all the factors in the one and the 
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same experiment with equal precision, there is a greater scope for 
this type of experimentation in comparison to the traditional methods 
and it renders the comparisons of the results. It also saves a great 
deal of lime and the experimental material. The main advantage of 
the factorial scheme is that it can be used to study the simultaneous 
variation among several factors whether they are independent or 
interact each other and provides a way to test the significance of the 
interaction between two or more factors. 

r 

Definitions and Symbolic representations— Before proceeding 
to the analysis of the factorial experiments, we set forth some 
definitions and give the symbolic representations to be used. 

(i) Case of 2 2 factorial experiment : 

In order to have a clear conception of the terms to be defined, 
we consider an experiment of wheat with two factors each at two 
Ifvels. These are nitrogen, none (n 0 ) versus 22 kgms/Hect. (n x ) and 
superphosphate, none (p 0 ) versus 22 kgms/Hect. (p x ). This experiment 
is called a 2x2 or 2 2 factorial- experiment. Let us suppose that the 


mean yields (in 
combinations are as 

maunds/acre) of 
given below— 

the four 

possible treament 

S. No. 

i 

2 

3 

4 

Treat, comb. : 

D oPo 

n«Pi 

n iPo 

ihPi 

Mean yields : 

26 

30 

28 

34 


These yields can be put in the following tabular-form— 


Super phosphate 

Nitrogen 

n« 1 n x 

| Response (n x — n 0 ) 

Po 

26 28 

2 

Pi 

34 

4 

1 

Response (p x — p„) 

4 6 

1 
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From the above table, we see that— 

(i) The application of supper phosphate has increased the 
yield by 4 maunds/acre in the absence of nitrogen and 6 maunds/acre 
in the presence of nitrogen. These are called the Simple-effects of 

super phosphate. The average effect {~y ~— 5 maunds/acre j of its 

application is called the Main effect of Super-phosphate. 

(ii) The application of nitrogen has increased the yield of 
wheat by 2 maunds/acre in the absence of super-phosphate and 
4 maund>/acre in the presence of super phosphate. These are called 

( 2+4 

— — =.3 maunds/ 

acrej of its application is called the Main effect of nitrogen. 

(iii) The 2 simple effects of nitrogen are not the same, this 
fact indicates that the fwo factors are not independent. If they were 
so, then the increase in yields due to the pplication of nitrogen 
should have been the same at the two different levels of the super- 
phosphate. Similarly, the two simple effects of super phosphate are 
not the same leading to the conclusion that the two factors are not 
independent but interact each other. The measure of the extent 
to which the two factors interact is given either by half of the 
difference between the simple effect of nitrogen in presence of the 
super phosphate and that of in the absence of super phosphate 

^ i. e. maund/acre j or by half of the difference between 

the simple effect of super phosphate in the presence of nitrogen and 

that of in the absence of nitrogen^ i. e. ^— = 1 maund/acre j . 

For simplicity, the first level of any factor in treatment 
combinations is signified by its absence and the suffix of the second 
level is dropped. The treatment combination consisting first level 
of all the factors is denoted by (1). Now the four treatment- 
combinations n 0 p 0 , n 0 p 1 nip 0 & n^ can be written as — 

(1), p, n & np respectively. The symbol (p) stands for the total 
yield of the plots receiving the treatment combination n 0 px or p. 
S imil ar meanings are attached with the alike symbols. 
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Computation of main effects and interactions : For computing 
a main effect or interaction, first wfe write down its corresponding 

it 

expression— (»±1) • (p± 1); where V is the no. of replications and 

any bracket contains either+ve or — ve sign according to the absence 
or presence of the corresponding capital letters in the symbol for 
the main effect or the interaction to be computed. Then expending 
the expression algebraically and substituting the yields for the 
treatment combinations. Thus — 

faP+H-Pi— (1)J 

1 f .»p) , (n) (p) (I)] 

~ 2L r + r r ~ r J 

=-~ [34+26—30— 26]=3 

Symbolic representation : It is customary to denote the factors, 
their main effects and interactions by Capita l letters and the different 
levels and their combinations by small letters. In the present 
example. 

N-> denotes the factor nitrogen and the main 

effect of nitrogen, 

P-> denotes the factor Super phosphate and the 

main effect of Super phosphate, 

NPorPN-> denotes the interaction of the two factors 
Nitrogen and super phosphate. It is called 
two factor-interaction or first-order interaction . 


n 0 -> 

denotes the first level of nitrogen, (none) 

Dj— > 

„ „ second „ 

n t 

(22 kgm/Hect.) 

p -► 

„ „ first „ 

„ , (none) 

pi-> 

„ „ second „ 

99 9 

(22 kgm/Hect) 



Factorial Experiments 


123 * 


n 0 p 0 -> „ „ combination of nitrogen and Super 

phosphate both at first level, 

n 0 pi-> * „ „ combination of nitrogen at first level 

and phosphate at second level, 
nipo-> „ ,, combination of nitrogen at second 

level and phosphate at first level, 
nipi -> „ „ combination of nitrogen and 

phosphate both at second level. 

=±[np+p-n-( 1)] 

"2 L r + r r r J 
= ~[ 34 + 3 °- 28 - 26 )]-5 

iVP = ^[(p-l)(n-l) ] 

=^r[np-p-n+(l) j 

_ If («£)_(j>)_(») , 0)1 • 

~ 2 L r r r + r J 

= y^34--30— 28+26)j — 1 

The symbol [#]=[(«- l)(p+)] stand for the total factorial 
effect of nitrogen and the similar meanings for others. 
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Statistical analysis : The factorial experements are performed 
either in C. R. D., R. B. D. or L. S. D. and so their analysis under 
factorial scheme remains the same except that the treatment-sum of 
square is split up into its components each with I'd. f. In the present 
example, the component sum of squares are computed as given 
below— 


Treat, component 

D. F. 

| 

| S. Sf 

N 

1 

~ n ~ 2 \ H r P ' I)P =[A^P/2V 

P 

1 

l(«+l)(/>--l)] 2 /2 2 r=[P|7> 2 r 

NP 

1 

[(«- 1 (p -l)| 2 /2 2 r~[NP] 2 /2 2 r 

Totals J 

3 

S. S. (treat.) 


I 


By yate’s method : It is a simple and rapid method giving 
us a convenient way of computing the treatment component sum of 
square. The computational work is carried out in the following 
tabular form — 


Treat, comp, 
in standard 
order 

Total 

yield 

i 

1 » 

I 11 

Total 

fact. 

effect 

S. S. 

du c 
to 

(i) 

(1)' 

(!)+(«) 1 

1 

1 

(l) + («) 
+(p)+(np) 

G 

<72 

— 

n 

(») ; 

1 

(p)+(np)J 

(«)-(D 

+{np)-(p) 

[N 1 

[N]*l 2*r 

N 

P 

(p) i 
i 

(»)-(!) 'l 

1 

| 

(p)+(np) 

-(!)-(«) 

[P] 

[P] 2 /2 2 r 

P 

up 

1 

(«p)j| 

(np)-(p)J 

(np)-(p) 

-(«)+(!) 

[NP] 

[NP] 2 /2*r 

NP 
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The first two figures in column I are totals of the two pairs 
and the last two figures are differences of the same pairs in the 
column of total yield. (Where the first value of a pair is subtracted 
from the second).* The column II has been obtianed by the similar 
operations on*the pairs of the column I. 

Exp. (1) An experiment was planned to study the effect of 
sulphate potash and super phosphate on yield of potatoes. All the 
combinations of 2 levels of super phosphate [0 cent (p 0 ) and 5 cent 
(Pi)/acre] and two levels of sulphate of potash |0 cent (k 0 ) and 5 cent 
(Ai)/acre] were studied is a randomized block design with 4 

replications for each. The following yield |lbs per plot= — — acre j 

were obtained— 


Block 

(1) 

k 

P 

kp 

I 

23 

25 

22 

38 


P 

(1) 

k 

kp 

II 

40 

26 

36 

38 


(1) 

k 

pk 

P 

III 

29 

20 

30 

20 


kp 

k 

P 

(1) 

IV 

34 

31 

24 

' 28 


Analyse the data and give your conclusions ? 

Solutions : Taking deviations from y— 29, we prepare the 
following table for computations of the S. S. due to treatments and 
blocks— 



,126 


The Experimental Designs 


. \ Block 
Treat \ 
comb. \ 

I 

Ii 

111 

| 

Bj 

( T =T) 

0) 

-6 

(36) 

-3 

(9) 

1 o 
! (0) 

1 


-10 

(46) 

100 

k 

-4 

(16) 

7 

(49) 

-9 

(81) 


-4 I 

. (150) | 

( 16 

P 

f 

-7 

(49) 

11 

(121) 

-9 

(81) 

-5 

(25) 

-10 

(276) 

. 

100 

kp 

9 

(81) 

9 

(81) 

' 1 
(1) 

5 

(25) 

24 

(1881 

576 

Totals 

=B 

-8 

(182) 

24 

(260) 

-17 

(163) 

1 

(55) 

^ 0=G 
(660) 

0 

/ Totals\ a 
(=B j 

64 

576 

289 j 

1 . 

1 0 



Ho : The data is homogeneous with respect to the blocks and 
the treatments. 


C. F. 


1(W_ 0 

~N ~ 16 
44 


T. S. S.=SS^ W -C. F.= 660— 0=660 
* J 


S. S. (blocks) =y %-C. F. 

64+576+289+1 „930 

“ 4 4 


232-50 


S. S. (treat)=^ ~~C. F. 


100+16+100+576 


=^ 2 =t98 0 


S. S. (errer)=r. S. S. S. S. due to (blocks+treat.) 

=660— (232*50+198-0)=660— 430-5=229-50 
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The treatment component sum of squares are obtained by 
Yate’s Method— • 


Treatment 

combination 

Total 

yield 

I 

II 

Total 

factorial 

effect 

s. s. 

due 

to 

(i) 

-101 

1 

-141 

1 

0 

G 

(0)2/16=0 
=C F. 

— 

k 

-4 J 

14 J 

40 

[K] 

(40)2/16=100 

K 

P 

-101 

1 

'I 

28 

[P] 

(28) 2 /16=49 

P 

kp 

24 J 

34J 

f 28 

! 

[KP1 

! 

(28)2/16=49 
Total =198 

KP 

1 


Now we prepare the following A. V. T.— 


Source of 
variation 

D 

F. 

S. S. 

M. S. S. 

F cal. # 

F 

at 

5% 

i i% 

Blocks 


1 

3 


232-5 

77-5 

3-04 

3-86 

6-99 

Treat. 


3 


198-0 

66-0 

2-59 

3-86 

6-99 

K ' 

1 

i 

r 


100' 


100 

3-92 

5-12 

10-56 

P 

1 

i 

=3 

49 

= 198 

| 

49 

1-92 

»> 

t) 

KP d 

1 

ij 


49. 

J 

1 

49 

1-92 


>> 

Error 


9 


229-5 

25-5 

1 

— 

— 

Totals 

! 15 | 

6600 

— 

— 

— 

— 
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The calculated values of F corresponding to the blocks, 
treatments and treatment-components come out to be insignificant 
for all. 

Inference — There is no significant difference between the blocks 
and the treatments /. e. the data is homogeneous with nspect to the 
blocks and the treatments. 

S. E. : The S. E. of a iy efFecr or interaction in the case of a 
2 n factorical experiment is = 



For n=2, the S. E. 



and for n= 3, the S. E. 



Case of 2 3 factorial experiment — In case of 2 3 factorical 
experiment (i e 3 factors each ot two levels) a 0 , a lt b 0 , b l and c 0 , Cj 
respectively. Then the eight possible treatment combinations in 
standard order will be given as (/), a, b, ab, c, ac, be and abc. A 2 3 
factorial experiment can be conducted either in C. R. £>., R. B. D. 
or L. S. D. and its analysis remains the same except that the 
treatment S. S. is split up into its components each with 1 d. f. By 
the straight forward extension of the rules given in 2 l factorial 
axperiment, we have treatment-component S. S. as given belO'V — 

Treatment . 

component 

A 1 [<a-l)(b+l)(c+l)]7 2 3 .r =--[Al 3 / 2 3 .r 

B 1 [(a + 1 )(b — l)(c 4- 1 )P/ 2 3 . r =[BJ7 2 3 .r 

AB 1 t(a-l)(b-l)(c+l)p/ 2 3 .r =[ABp/ 2 3 .r 

C 1 [(a-f l)(b+ l).c— l)| 2 / 2 3 .r =tC] 2 / 2 3 r 

AC 1 [(a-l](b+l)(C-l)] 2 / 2 3 .r =[AC] 2 / 2 3 .r 

BC 1 ' t(a+l)(b-l)(C-l)p/ 2 8 .r =[BC) 2 / 2 3 .r 

ABC 1 ’ [(a — 1 )(b — 1 )(C — 1 )] a / a 3 .r =[AbC] 2 / 2 3 .r 


Totals 


S. S. (treatments) 
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Exp. No. (2) The following lay out gives the barley yield (in 
kgms./plot) of 32 plots of a 2 3 factorial experiment in which three 
factors each at 2 levels are to be tested. Analyse tne data and state 
your conclusions ? The notations are— 


m for manure 

n for nitrogen and 

p for phosphorus 


Replication yield/plot (in kgms.l 


I 

(1) 

m 

mn 

n 

P 

np 

mp 

mnp 


30 

45 

50 

24 

27 

34 

40 

74 

11 

m 

(1) 

P 

n 

mp 

mnp 

np 

mn 


39 

41 

37 

34 

32 

63 

29 

74 

III 

(1) 

n 

m 

np 

mp 

mnp 

mn 

P 


25 

22 

46 

30 

38 

68 

54 

25 

IV 

mn 

m 

P 

n 

(1) 

mp 

np 

mnp 


62 

43 

26 

16 

17 

46 

28 

61 


Solution— 

Ho : The data is homogeneous with respect to the treatments 
and the replications (blocks). 
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C F =51 =<2)1-0 

N 32 

»% 

8 4 

7. 5. S.=S2^ 2 ,i-C. F. =8068 -0=8068 

ij 

S.S. (blocks) = ^ 51-C. F. = L 6 J+: 84 1 + 144+441 
J O 


8 


1 142 


8 


= 180-25 


5 . 5 . (treat.) J*TL<- C F = 22 ° 9 -±^’+I , 2 i - 6 _0 27672 
>4 4 


4 

=6918 0 


S S. (crror) = F. S. S'.— S'. S. due to (blocks+treat.) 
= 8068— (180-25 | 6918-00) 


=8068 -7098-25 =969-75 

The treatment-component sum of squares are obtained by Yate’s 
Method — 


Treat. 

comp. 

Total 

yield 

I 

1 11 

III 

Total 

fact. 

effect] 

| S. S’. 

jdue tc 

a) 1 

-471 

-341 

1 

f — 181 

0 

1 G 

'(0)2/32 =0 =C. F. 


m 

13J 

16J 

18 J 

390 

\ [M] 

j (390)2/32 4753 *25 

M 

n 

-641 

-491 

2041 

166 

[/VJ (166) 2 /32= 861-125 

N 

mn 

80 J 

67 J 

186 J 

188 

1 

[MN] 

; (188)732= 1 104-500 

MN 

P 

-fi 

601 

501 

36 

\P\ 

! (36) 2 /32= 40 500 

P 

mp 

-4 J 

144 J. 

1 16 J 

—18 

| MP] 

(18)2/52= 10-125 

MP 

np 

-391 

411 

841 

66 

[NP] 

, (66)V32= 136-125 

NP 

mnp 

106 J 

1451 

10 J 

20 

[MNP) 

(20)2/32= 12-500 

MNP 


Total=69 18-00 
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Now we arrive at the following A. V. T. — 


Source of 
variation 

D. F' 

S. S. 

M. S. S. 

F cal. 

F at 

1 5% | 

i% 

i 

Blocks 

3 

180-250 

| 60 0833 

1-08 

3-07 

4*8.7 

Treat. 

7 

6918-000 

988 2857 

21-41** 

249 

3-65 

M 'l 

, n 

4753- 125 

4753 1250 

102-902** 

4-32 

802 

N \‘l 

1 

. i 

1 

861125 

861 1250 

18 65** 

** 

*» 

MN | 

i 

1 I 

j 

11104-500 1104 5000 

2317** 

» » 

»* 

1 

P 1 

1 

: 1 1 
i 

40-500 

40-5000 

1 14 

248-25 

6214-5 

MP | 

i 

1 

1 | 

i 

10125 

1 

10-1250 

4 56 

>* 1 

** 

NP 1 
| 

1 

1 1 

i 

136-125 

136 1250 

! 2-95 

4-32 

8-02 

MNP J 

1 

u 

12-500 

12-5000 

| 3-69 

248*25 

6214 

Error 

ii 

! 

$69-750 

! 

46-1798 

j 

- 

. 

— 

Totals 

31 

8068 000 

1 

— 

— 

— 


Inference: F-test indicates that the main ’effects M, N and 
the interaction MN are highly significant which leads to the 
conclusion that over all the application of manure and nitrogen 
increase the yields -'and they are not independent but interact each 
other. 

Exp. (3) : In an N, P, K trial, with two levels of each fertilizer 
and 3 replicates, the tieamem-totals were— 

(1) n p k tip nk pk rtpk 

94 108 97 98 114 123 111 124 

The error mean square is known to be 8-90. Calculate the 
sum of squares for N and NP and test their significance ? 

(M. Sc. Agra, 1962) 
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Solution : 

Ho : The effect of N and NP are not significant. 

t(«— l)(p+l)*+m a 

K ' 2 *r ~ 8173 


S. S. (NP) 


l( tyk ) + ( nk ) — (pk ) — (&) + (np ) + (n ) -(/>)-( 1 )] 2 
24 

1124+123 — 1 11— 98 + 114+108— 97 7 - 941* 

24 " 


(69) 2 
“ 24 
_JNP]» 
- 2 3 r 


198-375 

Jj^JWp-lX*+»« 

8x3 ' 


l(npk)-(nk)-(pk)+(k)+(np)-(n)-(pnm a 

24 

1124-123-111+98 + 114-108-97+941* (9)* 
~ 24 “24 

= 3-375 


A. V. T.- 


Source of 
variation 

D. F. 

.s. 

M. S. S. 

F cal. 

/ 

7 at 

5% 

1 1 % 

N 

1 

198-375 

198-375 

22-29‘* 


8-86 

NP 


3-375 

3-375 

2-63 

245 

6142 

Error 

14 

— 


— 

— 

— 


Inference : The main effect of the fertilizer N is highly 
significant which proves that over all there is a significant response 
to N and the interaction NP is insignificant. 

(iii) Case of mxn factorial experiment : A factorial experiment 
with 2 factors A and B at m and n levels respectively is called a 
mxn factorial experiment. Let the -m’ levels of A be denoted by 
a 0 , a lt aj,...,a m _j and ‘n’ leve B by b 0 , bj, There are 

*mn’ possible treatment combinations. If ‘r’ denotes the no. of 
replication then the symbol (a< b*) denotes the yield of ‘r’ plots 
which receives the treatment combination a ( fy. 









Factorial Experiments 135 

The Partitioning of t reament S. S : The treatment S. S. with 
(mn— 1) d f. w/ll be split up into the following three components— 

(1) S Si (file to A with (m— 1) d. f., 

(2) S. S due to B with («— 1) d. f., 

and (?) S. S. due to AB with (m— 1) (n— 1) d. f. by arranging 
the total yields in the mxn table given below— 


\ 

\ 

B 

a o 


* ! 

a m — 1 

Totals 

b 0 i 

{a 0 h 0 ) i 

(a x bj i 

(aob n ) | 

( fl m- 1^2) 

| (M 


|(«.A) j 

(a A) 

1 

(«A> 

„ 


A) 























1 



} 1) { 

(ctib n l ) J 

J ( a 2^n_l) 

1 1 

1 z 

| («m_lb„_i) 

(*«_,) 

(<?«) 

(«i) 

(«.) 


<V-l> 

G 


Now °we compute 

lm— 1 n— I 


S. S. (treat.) =2 2 — — — C. F. , 

» j r 
G 2 

where C. F.=z- f and N ^r.m.n. 

N 

5. S. (A) =S~ -* -C- F. 

j r.n 

S.S .(B ) =2^-C'. F., 

and S. S. (AB)=S. S. (treat.) -5. S (A)-S. S. (B) 
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Standard Error : To compare the responses due to levels of 
A, B and different treatmeit combinations, we require three different 
s. E. s .- 

(i) 5*. E. of difference between the two means of A 

__ / 2V 7, 

** nr 

(ii) S. E. of the difference between the two means of B 



and (iii) S. E. of the difference between the two means of 
treament combinations 

r 

Exp. No. (4) Three varieties of wheat (Vj, v 2 and v 3 ) and 4 
dozes of ammonium sulphate (none (n 0 ), 22 kgms./Hect. (n x ), 44 
kgms./Hect. (n 2 ) and 66 kgms./Hect. (n 3 )] were tested in a R. B D. 
with 4 replicates. The layout with plot-yields (in kgms.) is given 
below. Analyse the data and state your conclusions ? 


(plot size 1/80 acre) 

Rep. I Rep II 


v 2 n 2 

v 3 n 3 

Vin 0 

v 3 n! 

V l n 2 

V l n O 

V 2 n i 

V i n 3 

17 

16 

15 

16 

15 

15 

16 

13 

VlH 3 

v 3 n 2 

Vin 2 

v 2 n 0 

v 2 n 3 

v 3 n 2 

v 3 n 0 

v 2 n 2 

15 

19 

18 

13 

18 

20 

15 

17 

V 3°0 

Wl 

v 2 n 3 

Vi^i 

Vjn 2 

V 3 n l 

v 3 n 3 

v 2 n 0 

14 

15 

18 

16 

17 

16 

18 

15 


Rep. Ill 



Rep. IV 


V l n 2 

v 2°2 

V 2 n l 

v 2 n 0 

V 3 n 0 

v i n i 

V l n 2 

v 3 n! 

15 

19 

17 

13 

15 

16 

17 

20 

Villi 

V l n 0 

V 3 n 3 

v l n 8 

v 2 n 3 

v 2 n 0 

v s n 2 

V l n 3 

17 

15 

19 

16 

19 

17 

22 

17 

v a n 8 

v 8 ni 

v 8 n a 

v 3 n 0 

v*n 8 

v a n 8 

V 2 n i 

VjO 0 

17 

17 

20 

14 

20 

19 

20 

16 
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Ho : The data is homogeneous with respect to the blocks (replicates) and the treatments. 
Taking deviations from y= 17, we Prepare thh following table— 



The Experimental Designs 
(7 2 144 

aF -w=-w = 30 

T. 5. S.=TEy*„-C. F.=208-3=205 00 

i j 

S S. (.real.) =* -C. F. - 49 +‘”tr Jj_ 3 . 0 

542 

- -i 3= 1 35-5E— 3-0= 1 32-50 

4 

5.5 (blocks) = Z *'-C. F =i^±ij+ 25 ±i 96 _3 0 

./ 1 ^ 1 i- 


— '-yy* — 3 0 — 37 * l7 -' 3 * 0 =3 4 * l 7 

5. 5. (error)^r. 5. 5. -S. S. (treat. +blocks) 

= 205 * 00 - ( 132*50 + 34 * 17 ) — 205 00 -166 67 --- 38*33 


Now the treatment S. S. can be partitioned into its components 
by forming the following table — 


\ N | 
\ | 
v \ 

n o i 

| 

n ' , 

n 2 

1 

n 3 

Totals 
= V 

|Totals^ 2 

Vi 

-7 

-4 

-1 

-7 

i 

-19 

361 

V 8 

—10 

! o 

5 

i 

4 

1 

1 

V 3 

-10 

i 

13 ! 

4 

8 

64 

Totals 
= N 

-27 

• -3 

17 

1 

-12 

1 

144 

i " | 

/ Totals\ 

l- N ) ! 

729 

, 

9 1 

1 

289 

1 

1 

144 

1 
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S. S V)J* t V :i-C. F. = 361_+1 +64_ 3 . 0 _-26 _ 3 0 


16 


16 


-33-625 

S. S. (N)=j ^-C. F. = 


16 

= 26-625- 3-000 


„ „ 729 + 9 + 289 +L-30=85 667 


12 - " “ 12 
S. S. (NV)=S. S. (treat .) -S S.(V+N) 

- 132-50 -(23-625+ 85 667)= 1 32 500- 109*292 

=23-208 


Finally we arrive at the following A. V. T.— 


Source of 
variation 

Id. f. \ 

\ i 

i 

S. S. 

M. S. S. 

F cal. 

F 

5% | 

at 

i% 

Blocks 


3 

34-170 

1139 

9-802** 

2-89 

4^ 

-d 

4^ 

Treat. 


11 

132-500 

120455 

1036** 

2 09 

284 

V ] 


2~\ 

23-625 1 

1 

11-8125 

10-16** 

3-29 

5-315 



| =11 

i 




N 


3 ! 

85-667 | 

28-5557 

24-57** j 

2-89 

4-44 

NV j 


6J 

| 

23-208 J 

3-868 

3-32* | 

2-39 

3-4 

Error 


33 

38-33 

1-162=^ 

L 

• 

— 

| 

Totals 

47 

205-000 

— 

— 

— 

— 


Inference — The calculated values of F corresponding to the 
blocks (replicates), treatment-components N and V come out to be 
highly significant showing that the main effects of ammonium 
sulphate and varities of wheat are significant. The two factors 
(varieties of wheat and ammonium sulphate) are not independent to 
each other but they interact as the effect of interaction (NF) is also 
significant. 
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Now we compute the S. E & of the differences between the two 

means of V, N and NV by the following formulae — 

(i) S. E. of the difference between the two means of V. 

/2VT /2 x 1162 

= J -JL = J " Tx 4' = V0 14525 =0-3811 
nr 

(ii) S. E. of the difference between the two means of N. 



2 X 1 r l 62 

3x4 = v/0193? =0-4401 ' 


and (iii) S. E. of the difference between the two means of NV. 

IVTZ /2x M62 

= / J = ./ 4 —=-/ 0=581=0-7622 

r 

The critical differences (C. D ) for S. E given in (i), (ii) & 

s s 

(iii) will be computed in the following manner — 

(i) (C. £>.) = (£. E. of difference) X / (33) 

5% -05 

=0 381 1 X 3-29= 1-2538 kgms./plot 

(ii) (C. D.)-—(S. E. of difference) x t (33) 

5% -05 

=0-4401 x 3-29= 1-4479 kms./plot 
and (iii) (C. D.)—(S. E. of difference) x t (33) 

5% . -05 

=0-7622 X 3-29=2-5076 kgms./plot 


Effect of ammonium sulphate — 


doze of N 
(in kgms./Hect.) 

average yield/plot 
(in kgms.) 

average yield/acre 
(in kgms.) 

0 

14-75 

14-75x90=1180 

22 

16-75 

16-75x80=1340 

44 

18-4167 

18-4167x80=1473-336 

66 

17-0833 

170833x80=1366-664 


C. D. at 5% for ammonium sulphate-means= 1 -4479 x 80 

= 115-832 kgms./acre 



Factorial Experiments 141 

An inspection of the above data reveals the fact that the 
application of ammonium sulphate has exhibited a significant -effect 
on the yields. The yields obtained by applying ammonium sulphate 
at the rate of 22 kgms./Hect., 44 kgms./Hect., and 66 kgms./Hect. 
are higher ' than that obtained without applying it. The rate of 44 
gms./hect. is the best of all. 


Varietal effect — 


Variety 

average yield 

average yield 

of wheat 

per plot (in kgms/Hect)j 

1 

per acre (in kgms/acre) 


Vx 

12-25 

12-25x80=980 

V 2 

16-9167 

16-9167x80=1353-336 

Vs 

17 67 

17-67x80=1413-6 


C. D. at 5% for variety of wheat= 1-2538x80= 100-304 

kgms/acre 

The main effect of variety (V) is highly significant which proves 
that over all there is a significant response to V. The maximum 
yield has been recorded in the case of variety v 3 followed by v 2 but 
their difference is not significant. The varieties v 2 and v 3 are 
significantly different from v^ 

The effect of NV— 

(average yields in kgms/acre) 


N 

\ 

v \ 

n o 

n i 

n 2 

n 3 

Vi 

(— J+17)x80=1220 

1280 

1340 

1220 

Vi 

1160 

1360 

1460 

1440 

V 3 

1160 

1380 

1620 

1440 


The C. D. at 5% for interaction (NV)=2-5076x80 

=200-608 kgms/acre 
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The interaction (NV) is significant at 5% l evc l which shows 
that over all the treatrne.it combinations differ significantly. The 
maximum yield has been recorded for the treatment- combi nation 
n 2 v 3 followed by n 2 v 2 but their difference in not significant. The 
variety v 3 with ammonium sulphate at the rate of 44 kgmS/Hect. is 
the best combination. 

e 

Solution — 

Ho : The main effects (M and V) and interaction (MV) are 
not significant. 

Exp. No. (5) In a mamirial cum variety trial (3 manures x 2 
varieties) laid out in a randomized block design with four replications, 
the following treatment totals are obtained— 

Totals of 4 plots 

Treatments : v,m 2 v 2 m 3 

Totals: 21 30 40 27 40 42 

The residual variance for 15 D. F = 1*5450. Calculate the main 
effects (M and V) and interaction (MV) and test their significance ? 

(M. Sc. Ag. Agra, 1959) 
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since N=r>^m*:n 
=4x3x2 
=24 

55. (treat.) F , whe,e y u is the yield of 4 plots. 

=1215—1666 67=- 1758-50- 1666 -67 = 91-83 
4 

S.S.yM) : * 'C' C / „ 2304 + 4900-1- 672 i _ |666 . 67 

= 1741 — 1666-67 

5. S (V)= ^ y F. | 1 - 11 - 8 - 1 - 1666 67 

J IZ 


20162 


-1666 67 - - 1680-167 — 1666 67 


12 

= 13-497 

5. 5 (MV)=5 5. (treat.)— 5 5. (M4 V) 

= 91*83 (74 33H- 13 497)==91 83 — 87-827 
=4-003 


Now we prepare the following A. V. T. — 


Source of 
variation 

D. 

F. 

5. 5. 

M S. S. 

Fcal. 

| F at 

5% 

1% 

T reatment 

l 

5 

[ 

91-83 

18-366 

1 1 82** 

2-90 

4-56 

M 1 

1 

2 ' 


74 33 

37-165 

24-05** 

3-68 

6-36 

1 

v i 

I 

1 

= 5 

13-497 

13-497 

8-74** 

4-54 

8 68 

1 

MV J 

2 ^ 

1 

4-003 ' 

2-0015 

1-29 

3 68 

6-36 

i 

Error 

15 

1 

— 

1-545 

— 

— 

— 
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Inference — The calculated variance ratios for treatment main- 
effects M and V come out to be highly significant which show that 
the main-effects of manures and varieties are significant. The two 
factors (manure and variety) are independent of each other and do 
not interact as their interaction effect is not significant. 

Exp. No. (6) An experiment was planned to study the effect 
of ammonium sulphate (N) and super phosphate (P) on the yield of 
maize (Ganga hybrid No. 1). All combinations of four levels of 
super phosphate (0 kgms, 10 kgms, 20 kgms and 30 kgms/acre) and 
ammonium sulphate (0 kgms, 10 kgms and 20 kgms/acre) were 
studied in a R. B. D. with 2 replications. The following yields 
(kgms/plot, plot size=l/40 acre) were obtained — 


Treat. 

Rep. I 

Rep. II 

Totals 

n«Po 

181-2 

184-2 

365-4 

«oPi 

184-5 

184-5 

369 0 

"oPa 

191-5 

191 9 

383 4 

n oP3 

199-8 

203 2 

4030 

n iPo 

196-9 

2113 

408-2 

"iPl 

192-5 

212-6 

405 1 

n iPa 

188-9 

201-3 

390-2 

n iPa 

200-5 

207-4 

4079 

n 2 Po 

215-2 

231-8 

447-0 

n jPi 

208-2 

216-2 

424-4 

n 2 P 2 

1098 

208-5 

418-3 

n 2 p» 

221-8 

221*5 

443-3 


Obtain the M. S. for N, P and NP when it is given that the 
error variance in this experiment is 27 - 363. Test the significance 
of N, P and NP ? 
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C. 



4251-04 

24 


177*1267 


yv v 2 i 

S. i S. (treat.) = j — C. F., where y$ are the yields of 

2 plots 


7708-56 


-177-1267=3854-28-177-1267 


' =3677-1533 

S. s • (P)-^ 5 — c. F. 177-1267 


3429 83 


■177-1267 

=571-6383-177-1267 


5. S. (N) = y ^ C. F. = 


=394-5116 

6272 64+129-96+17689 00 


8 


-177 1267 


24091-60 


8 


— 177" 1267 


=3011-45-177-1267 

=2834-3233 

S. S. (NP)=S’. S. (treat)— 5. S. due to (N+P) 

=3677*1533 — (2834-3233 + 394 51 16) 

= 2677-1533-3228-8349=448-3184 


Now we arrive at the .following A. V T.— 


Source of 
variation 



j M. S.S 

Fcal. 

Fat 

D. F. 

5. S. 

5% 

1 1% 

P 

3 

394-5116 

131-5039 

4-81* 

3-59 

6-22 

N 

2 

2834-3233 

1417-1617 

51-79** 

3-98 

7-00 

NP 

6 

448-3184 

74-7197 

2-73 

309 

507 

Error 

11 

— 

27*363 
= V E 

* 
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Inference— F-test indicates that the main effect P is significant 
at 5% and N is highty significant while the interaction NP is not 
significant. ' * 

For comparing the response exhibited by different levels of P 
and N, we compute the C. D $ as given below— 

(i) S. E. of the difference between the two means of P 
/2VT /2x 27-363 ___ 

= V-~ = V 3x2 =a/9-121=3-02 


(C. D.)=( S. E. of difference)x t (1 1) 

5% -05 

=3-02 X 2-201=6-647 kgms/plot 
(ii) S. E. of the difference between the two mears of N. 



x 27-363 

■~4x2~ = =2-6154 


(C. D.)=(S. E. of difference) x t (11) 

5% -05 

=2-6154x2-201 = 5-7565 kgms/plot 


Effect of super phosphate— 


doze of P | 

(in kgms/acre) 

1 

average yield/plot 
| (in kgms.) 

average yield/acre 
(in kgms.) 

0 

203-43 

203-43x40=8137-20 

10 

199 75 

199-75x40=79900 

20 

198-65 

198-65x40=7946 0 

30 j 

209-03 

| 209 03x40=8361-20 


C. D. at 5% for super phosphate means=6 647 x 40 

=265-88 kgms/acre 
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The application of super-phosphate at the rates of 10, 20 and 
30 kgms/acre do not exhibit significant effects as compared to control 
while the effect of applying 30 kgms/acre is significantly different 
from the effects of the dozes 10 and 20 kgms/dcre. However, the 
max. yield has been recorded in the case of 30 kgms/acre. 


Effect of ammonium Sulphate- 


doze of N 

average yield/plot 

average yield/acre 

(in kgms/acre) 

(inkgms.) 

(in kgins.) 


: 

3 


190*1 

190-1x40=7604-0 

10 


201-425 

201-425x40=8057-0 

20 | 

* 

216*625 

216-625x40=8665*0 


C. D. at 5% for ammonium sulphate means— 2‘61 54x40 

=230*26 kgms/acre 


The application of ammonium-sulphate at the rates of 10 and 
20 kgmn/acre differ significantly from that of control. The maximum 
yield has been recorded in the case of 20 gms/acre followed by 10 
kgms/acre and their difference is significant. 
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S.N. 


Factorial Approach 


Traditional Approach 


1 


2 


3 

4 

5 

6 


It can be used in either 
case, whether the factors are 
independent or intetract each 
other. Thus, it’can be applied 
to a wide variety of cases. 

It compares all the factors 
in one and the same experi- 
ment and thus saves a great 
deal of time and the experi- 
mental material. 

It provides the informations 
regarding 'the main- effects as 
well as interactions. 

It compares ‘all the factors 
with the same precision and 
randers the comparison of the 
results. 

It selects the optimum com- 
bination. 

It compares each factor 
under the varying conditions 
of the other factors and hence 
the scope of the experiment is 
greater. 


It can be used only when 
the factors are independent. 
Thus, its use is limited. 


It compares different factors 
in the different, single, inde- 
pendent experiments and thus 
requires more time and the 
experimental material. 

It gives the information 
about the main-effects only. 

It compares the different 
factors with different precisions 
and does not render tho com- 
parison of the results. 

It cannot select the opti- 
mum combination. 

It compares each factor by 
keeping other factors constant 
and hence the scope of the 
experiment is limited. 


Thus, it is evident that the factorial experiments are of greater 
efficiency and comprehensiveness. 
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EXERCISE VII 

Q. 1. Describe the factorial method of experimentation and 
explain advantages ? 

What factors do you consider worth trial when testing four new 
cotton varieties ? Describe briefly the design you would choose for 
the trial and the layout of the experiment ? " 

(M. Sc. Ag. Agra, 1958) 

Q, 2. An experiment is to be conducted for finding the effect 
of the three levels of irrigation on the yield of four varieties of 
potatoes. Suggest a suitable plan for this experiment and give the 
skelton of analysis of variance table ? Also indicate the method for 
calculating the sums of squares for the different components ? 

(M. Sc. Ag. Agra, 1961) 

Q. 3. The following table gives the treatment totals for an 
experiment with three levels of nitrogen fertilizer and three levels of 
phosphate fertilizer. The data are the number of lettuce plants that 
emerged from the ground and are totals over 12 plots — 


Number of lettuce plants emerging— 

Levels of nitrogen 


Levels of | 

'Po 

n 0 

449 

"i 

413 

n a 
326 

Totals 

1188 

phosphate 

fPi 

409 

358 

291 

1058 

IPs 

341 

278 

312 

931 

Totals 


1199 

1049 

929 

3177 


(a) Obtain the m. s. for nitrogen, phosphate and nitrogen 
x phosphate ? 

(b) It is given that the error m.s. in this experiment is about 
59. Which of the effects indicate significance ? 


Ans. (a) m. s. for 

N= 508 -3334, P=399-8056 
and NP=79-2639 
(b) N and P are significant. 
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Q 4. An experiment was planned to study the effect of 
sulphate of potash and super phosphate on yield of potatoes* All 
combinations of the three levels of super sulphate [0 cwt (p 0 ), 5 cwt 
(pj, 10 cwt (p 2 ) petwcre] and two levels of sulphate of potash [0 cwt 
lk 0 ), 2 cwt (kj per acre] were studied in a 6 x 6 L. S. The following 
yields (lbs/plot= 1/70 acre) were obtained— 


Kvo 

^ 0P2 

6x6 Latin Square 

k 0 Po k 0 p. 

k,Pi 

kjp 2 

186 

187 

208 

222 

296 

331 

kiPi 

koPo 

134 

kxp 2 

k 0 p 2 

koPi 

kiPo 

213 

296 

265 

250 

253 

k 0 Pi 

Mo 

k 0 p 2 

Ma 

k 0 Po 

kiPi 

198 

155 

272 

290 

261 

310 

ktfa 

kiPi 

k»Pi 

kiPo' 

koP 2 

k 0 P# 

233 

184 

218 

234 

248 

293 

k 0 p 2 


kiPi 

koPo 

kjPo 

koPi 

245 

233 

282 

248 

247 

303 

k.Po 

k„Pl 

kiPo 

kiP! 

kjp 2 

k 0 Pa 

196 

228 

242 

255 

273 

294 


Analyse the data and give your conclusions ? 

(M. A. Patna, 1954) 

Ans. V £ =3407835 for 20 d. f. 

V R =820-25 

V c =7876-45^ 

Pj =1363-135 

V K =2288-03 

V p =6996-335 


V Kp =265-44 




CHAPTER VIII 


Confounding 


In factorial experiments, when the no. of factors and their 
levels is large i. e. the treament combinations are numerous, the 
blocks require a larger area in comparison to that required for a 
fewer no. of factors and their levels. It is a common experience in 
agricultural experimentation that within a large area considerable 
soil heterogeneity is present which increases the experimental error 
and lowers the precision of the experiment. Thus, when the no. of 
treatments is numerous, the precision of the factorial experiment is 
affected adversely. One method of re-introducing the homogeneity 
within the blocks and of increasing the precision is to partition each 
replicate into two or morr blocks such that the main effects and the 
interactions of interest are tested with a relatively higher precision 
than the interactions of little experimental value. This object is 
achieved by adopting the artifice of confounding. It consists in 
mixing up inseparably the effects of unimportant interactions with the 
block effects. 

Technique of Confounding in 2 3 factorial experiment : In fact, 
the confounding is not necessary in this case but it has been chosen 
for the convenience of ready grasp of the technique. Suppose, the 
three factors are A, B and C each at 2 levels. The possible treatment 
combinations are (1), a, b, ab, c, ac, be and abc. In order to 
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conlound abc (three factor interaction), first we write down its 
corresponding algebraic expression (a— l)(b— l)(c— 1)— abc— be— ac 
+a— ab+b-f c— (1) and then randomize the four treatment- 
combinations of +ve sign (abc, b, c and a) in one block of each 
replicate and the remaining four of — ve sign (be, ac, ab, 1) in the 
other block. Each replicate consists of 2 blocks. In a similar manner, 
any main effect or interaction can be confounded with the block 
effects. The effectof the confonnded 'interaction is inseparably 
mixed with the blocks effect add hence it cannot be estimated 
separately. The unconfounded main effects and interactions are 
independent of the blocks-effect and can be estimated separately. 

In the confounding scheme, the precisions of the unconfounded 
main effects and the interactions are higher than the precisions of 
the confounded ones. Hence, relatively unimportant interactions 
should be confounded. Generally, the higher order interactions are 
deemed to be unmiportant either because of their insignificance or 
because of the impossibility of the application of several factors- 
treatment combinations. 

Complete confounding : If the same interaction has been 
confounded with all the replications then it is a case of complete or 
total conofunding and the interaction is called to be completely 
confounded. The precision regarding the completely confounded 
interaction is sacrificed altogether while that of regarding the others 
is increased The complete confounding is chosen only when one 
of the interactions is of little importance and we do not require any 
knowledge regarding it from the experiment. 

Statistical Analysis of completely confounded 2 3 factorial Expt. : 
The statistical analysis is carried out in the same way as in the case 
of factorial experiments using yates method for computing the S S. 
for main effects and interactions. The only modification is that 
neither the S. S for -the completely confounded interaction is 



Confounding 


155 


computed nor it is included in the A. V. T. The skelton of # the 
A. V. T. when ‘abc’ is completely confounded, is given below— • 


1 


Source of , 
variation 

D. F. 

S. S. ! 

M. S. S. / 

"cal.; 

F at 

1 1 

1 

1% 

Blocks ^ 

2,-.' 



... 


... 

Treatments , 

6 



... 


... 

i 

A 

l 



I 


... 

B 

1 



... 


... 

AB 

1 



... 


... 

C 

1 



... 



AC 

1 



... 


• • • 

BC 

1 




... 



Error 

j 6(»—l) 


* " * 

i 

I 


Totals 

Hr— 


1 










Exp. (1): A manurial trial was carried out on potato with 
N, P and K each at 2 levels. The interaction NPK is completely 
confounded with blocks. The plan and yields (in seers) are given 
below — 

Rep. 1 Rep. 11 Rep. Ill 


Bx 

b 2 

-®3 ^4 

*5 

B a 

npk 

~P 

k 

k P 

P 

nk 

50 

63 

55 

49 

63 

57 

P 

( 1 ) 

n 

np 

k 

(1) 

45 

47 

60 

52 

55 

55 

n 

nk 

npk 

(1) 

npk 

np 

62 

57 

60 

51 

55 

60 

k 

kp 

P 

nk 

n 

I kp 

45 

50 

56 

50 

70 

1 60 


Analyse the data and state your conclusions ? 
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Solusion : 

Ho : The data is homogeneous with respect to blocks and 
treatments. 

Taking the deviations from y=60 seers, we prepare the 
following tables to compute the S. S. for blocks, treatments and 
error — 

Rep. I Rep. II Rep. Ill 


\ Blocks 
\ 

treat. \ 


i ® 3 

b 6 , 

Totals =T 

npk 

—10 

1 0 

-5 

-15 

( 100 ) 

1 ( 0 ) 

( 25 ) 

( 125 ) 


-15 

-4 

3 

-16 

P 

( 225 ) 

( 16 ) 

( 9 ) 

( 250 ) 


2 

0 

! io 

12 

n 

( 4 ) 

( 0 ) 


( 104 ) 


-15 j 

-5 

-5 

-25 

k 

( 225 ) 1 

HEt&uH 

( 25 ) 

( 275 ) 

Totals =B 

— *8 

Kivikf 

3 1 

-44 

( 554 ) 

KfiH 

( 159 ) ! 

( 754 ) 

(Totals =B) 2 

| 1444 

BIB 

1 9 | 

| — 


B a 1 

b 4 

1 B # 


np 

3 

-8 


—5 

( 9 ) 

( 64 ) 

( 0 ) 

( 73 ) 

(1) 

-13 

-9 

-5 

.-27 

( 169 ) 

( 81 ) 

( 25 ) 

( 275 ) 

nk 

i 

—3 

—10 1 

-3 

-16 

( 9 ) 

( 100 ) j 

( 9 ) 

( 118 ) 


■■■ 



-21 

kp 

—10 


0 

( 221 ) 

( 100 ) 


( 0 ) 


Totals=B 

BiM 

■m 

-8 

( 34 ) 

-69 

(687) 

(Totals =B) 2 | 

529 | 

1444 | 

64 | 

— 
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C F — — = aii) 2 = 532-0417 

C *' N 24 24 ~^ zuqu 

* T. S. C. F=(754+687)— 532'0417 

*7 

=908*9583 

S. S. (blocks) = y j'-c. F 

(1444+ 81 +9)+(529+l 4 44+64 ) 53 ^ 1? 
4 


3571 


-532-0417 


= 892-75— 532-0417=360-7083 


Treatment S. S.— 


Treat. 

comb. 

Total 

yield 

1 

II 

III 

Totai 
factorial 
i effect 

S.S. 

due to 

( 1 ) 

-27 

-15 

-36 

-113 

G 

(—113)a/24 
= 5 32 0417 C. F 

— 

n 

12 

-21 

-77 

65 

IN] 

(65) 2 /24 

= 176-0417 

N 

p 

-16 

-41 

50 

-1 

[P] 

(-1)724 

=0 0417 

P 

np 

-5 

-36 

15 

-31 

[NP] 

(-31)2/24 

.• =40 0417 

NP 

k 

-25 

39 

-6 

-41 

[K] 

(-41)2/24 

=70 0417 

K 

nk 

-16 

11 

5 

-35 

[NK] 

(-35)*/24 

=510417 

NK 

pk 

—21 

9 

-28 

11 

[PK] 

(11)2/24 

=5 0417 

PK 

npk 

-15 

6 

-3 

25 

[NPK] 

Confounded 

NPK 


S. S. (treat.) = 176-0417+... +5-0417=342-2502 
S. S. (error) =T. S. S.—S. S. due to (blocks+ treat.) 
= 908 '9583— (360*7083 + 342*2502) 
=908-9583-702 9585=205-9998 
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Finally we arrive al the following A. V. 7*.— 


Source of 
variation 

D. F. 

S. S. 

M. S S 

F. oil , 

Fat 

5% ' 1%“ 

Blocks 

S'] 

360 7083 

72 1417 

4 20* 

3-11 

5 06 

Treat. 

6 

342-2502 

57 0417 

3-32* 

3-00 

4-82 

N 

1 

1760417 

176-0417 

10-25** 

4 75 

9*33 

P 

1 

0-0417 

00417 

411 75* 

2440 

61060 

NP 

l =6 

400417 

40-0417 

2-33 

4-75 

9 33 

K 

1 

| 700417 

70-0417 

408 

19 

»» 

NK 

I 

51-0417 

51-0417 

2 97 

1 

»* 

*> 

PK 

1J 

5-0417 

5-0417 

34 06 

244-0 

6106-0 

Error 

12 

205-9998 

17-17 

— 

— 

— ‘ 

Totals 

1 23 

| 908-9583 

| 

1 - 

1 - 

1 “ 


Inference — The blocks and the treatment effects are significant 
at 5% level and the treatment componet N is highly significant while 
the other treatment components come out to be insignificant except 
P which is significant at 5% level only. 

Partial Confounding —The confounding scheme in which the 
different interactions are confonded in differei t replicates is called 
partial confounding . Below is given an example of partial confounding 
where none of the interactions is cofounded in all the replicates — 

Rep. I Rep. II Rep. Ill 

B x B 2 B 3 B 4 B 6 B a 
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The interactions ABC, AB and BC are said to be partially 
confounded as any .one of them is cofounded in one replicate onh. 
In partial confounding scheme any interaction is confounded in one 
or more replicates but not in all. 

The above confounding scheme provides the full information 
regarding the u 'confounded effects (A, B, C a"d AC) and partial 
(2/3) information regarding the confounded interactions (ABC, AB 
and BC). Because the unconfounded effects are estimated from all 
the three replicates while each of the confounded interactions is 
estimated from the 2 replicates only in which it is not confounded. 
Hence, the confonded interactions utilize 2/3 observations while the 
unconfounded effects utilize the whole data. 

Partial confounding is adopted when we want to increase the 
precision by partitioning the replicates into two or more blocks and 
at the same time desire to obtain the informations about all the effects 
but ready to sacrifice a fraction of informations regarding some 
interactions. The effects of greater importance are kept unconfounded 
while the effects of relatively less importance are confounded 
partially. 

In the case of partial confounding in 2 3 F. Expt., the treatments 
sum of squares are obtained from Yates Method an*d firally adjusting 
for the confounded effects. The adjusting factor for any confounded 
interaction is computed as below — 

(i) Note the replicates in which the given interaction is 
confounded, 

(ii) Note the sign of (1) in the corresponding algebraic expression. 
If the sign is +ve, then — 

adjusting factor = [total of the blocks containing (1) of the replicates 
in which the interaction is confounded]— [total of 
the blocks not containing (1) of the replicates in 
which the interaction is confounded] 
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and if the sign of (1) is — ve, then 

adjusting factor =[total of the blocks not containing ( I ) of the 
replicates in which the interaction isconfounded]— 
[total of the blocks containing (1) of the 
replicates in which the interaction is 
confounded] 

(iii) Keep the divisor 2 3 ( r — i) where i is the fio. of replicates 
in which the interaction is confounded. 


The whole procedure of statistical analysis in a 2 s partial 
confounding experiment is illustrated in the following example— 


Exp. (2) : The table given below gives the yields of wheat 
(in seers) per plot for a partially confounded 2 3 factorial experiment 
on 24 plots— 


Rep. I Red. II Rep. Ill 



Bi 

b 2 

*3 


*5 


B* 

ab 

101 

b 

88 

(1) 125 

ab 115 

""be 

75 

a 53 

abc 

111 

a 

90 

abc 95 

c 95 

ac 

100 

abc 76 

(1) 

75 

be 

115 

ac 80 

be 90 

(1) 

55 

b 65 

c 

55 

ac 

75 

b 100 

a 80 

ab 

95 

c 82 


AB confounded AC confounded ABC confounded 


Analyse the data and state your conclusions ? 
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Sol. : 

• Ho : The data is homogeneous with respect to blocks and 
treatments. » * 

Taking deviations from y =87 seers, we prepare the following 
table to compute the S. S. for blocks and T. S S.— 


Block- 

\ 

\ 

Treat. \ 
0>mb. \ 

Rep 

Bi 

i. I 

Bi 

Rep II 

B% B t 

Rep III 

B, B, 

Totah=T 

0) 

— 12 
(144) | 

— 

JO 

(1444) 

1 

1 

-32 

(1024) 

— 

-6 

(2612) 

a 

— 

3 

(9) 

— 

-7 1 

(49) | 

— 

-34 

(1156) 

-38 

(1214) 

i 

b 

— 

1 

(1) 

13 

(169) 

— 

— 

-22 

(484) 

-8 

(654 

ab 

! 

14 

(196) 



— 

28 

(784) I 

5 

(25) 

— 

47 

(1005) 

c 

-32 | 
(1024) 

— 

— 

8 

(64) j 

— 

— 5 
.•(25) 

-29 

(1113) 

ac 

— 

1 -12 
(144) 

—7 

(49) 

1 

13 

(169) : 

— 

-6 

(362) 

be 

— 

28 

(784) 

— 

3 

(9) 

— 12 
(144) 

— 

19 

(93-) 

a c 

24 

(576) 

i 

8 

(64) 

— 

1 

- ii 
(121) 

1 21 
| (761) 

Totals=2? 

-6 

(1940) 

20 

(938) 

52 

(1726’ 

32 

1 (906; 

—26 

(1362) 

-72 

1 (1786 

fj? 

t Total, =fl) 2 

36 

4<)0 

l 

2704 

1 104 

676 

5184 

l 
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C f=^- ( -^-=0 
C * * N ~ 24 U 

T.S 5.=ZSy<,*-C. F.=8658— 0=8658' 

>j 


5. 5. (blocks) =I^p- 
j 

For treatment S. S. 


C. F.= ^?^-0=2506 


Treat 

Comb 

Total 

>ieid 

1 

11 

111 

adjusting 

factor 

adj. total 
fact, effect 

5.5. 

due 

to 

0) 

-61 

1 

-441 

1 

-51 

1 

0 

— 

0 

C. F.=0 

— 

a 

j 

-38 J 

39 J 

1 

5 J 

I 48 

— 

48 

96 00 

A 

b 

—81 

i 

-351 

23 ' 

j 158 

— 

158 

1040-1667 

B 

ab 

47 J 

40 J 

25 J 

66 

B 1 — B 2 
= -26 

66 -(—26) 
=92 

529 00 

AB 

c 

-291 

i 1 

-321 

1 

'83 1 

1 

10 

— 

10 

4-4167 

C 

ac 

1 1 

— 6 J J 

1 

55 J 

75 J 

2 

F 3 — F 4 
=20 

2-20 
* =—18 

20-25 

AC 

BC 

be 

19 1 

23 1 
| 

87 1 

-8 

— 

-8 

2 6667 


abc 

1 

21 J 

1 

2 J 

1 

-21 J 

-108 

Be-Bs 
= -46 

-108-1-46) 
= -62 

24025 

ABC 


5. S' (totaI)= 1932-750 

5. 5. terror) = T. S. S.—S. 5. due to (blocks+treat.) 

=8658 -(2506+ 19327501)= 8658-4438-7501 
=4219-2499 
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Finally we prepare the A. V. T.— 


a 

Source of 
variation 

D.F. 

'S.'S. 

M. S S 

F. cal. 

F at 

« 




1 5% 

1% 

Blocks 

5 

250600 

501-2 

1-31 


|5*32 

Treat. 

7 

'l 

1 | 

1 

1932*75*1 

276 1071 

1*39 


6-54 

A 

9600 'l 

96-00 

3-995 

243 


B 

1 

1 | 

| 

1040* 1667 | 

| 

1040- 1667 

2-71 

4-84 

9-65 

AB 

1 

1 | 

52900 | 

529 00 

1*34 


" 

C 

1 

1 | 

| 

4 4167 1 

| 

4-4167 

86-844 

243 


AC 

1 | 

1 

20*2500 | 

| 

20-25 

18 94 

>9 

99 

BC 

1 

1 | 

2 6667 | 

| 

2-6667 

143-84 

»9 

99 

ABC 

1 

1J 

240-2500 J 

2 0-25 

1-60 

99 

99 

Error 

11 

4219-2499 

383-5682 

— 

— 

— 

Totals 

1 23 

|8658-0 

1 - 

1 - 

1 ~ 1 

~ 


Inference : The data is homogeneous. 

S. E. : The standard error of a mean for unconfounded effect 



and that of a mean for confounded interaction 


J 


E 

2 (>-/) 


Where * i’ is the no. of replicates in which the interaction is 
confour ded. 

Merits and Demerits : The only advantage of confounding 
scheme lies in the fact that it reduces the experiment error considerally 
by partitioning the whole replicate into too or more blocks. 
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The dis-advantages are as follows - 

(i) The confounded interactions can estimated with lower 
precisions as the no. of replication for them is reduced. 

(ii) ihe statistical analysis is complex and especially when some 
of the units (observations) are missing. 

In confounding scheme, the increased precision is obtained at 
the cost of sacrifice of information (partially or completely) on 
certain unimportant interactions which leads to the universal truth • 
that nothing can be achieved without the sacrifice of the other and 
certain relatively unimportant things should be sacrificed. 
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Exercise VIII • 

. Q. (I) : What is confounding ? An experiment is to be carried 
out with fertili^r »N, P and K, each at two levels, on a crop, the 
triple interaction between the fertilizer being of no interest. Plan 
the layout of experiment in a confounded design and give a skelton 
analysis of the results. Will you recommend a confounded design 
for such a trial ? Give reasons ? 

(M. Sc. Ag. Agra, 1956) 
Q. (2) : What is confou .ding and what are its advantages ? 
A manurial trial is to be laid out in potato with N, P and K each at 
two levels. The interaction PK is to confounded with blocks. Give 
the complete plan for this experiment, taking three replications. 
Give also the skelton analysis of variance ? 

(M. Sc. Ag. Agra, 1959) 
Q. (3) : A factorial experiment involving 3 fertilizers N, P & K 
each t 2 levels was carried out in 6 replicates and the second order 
interaction NPK is completely confounded. The following results 
were recorded— 

Treat. : (1) n p np k nk pk npk 
1 otal yield : 

of 6 plots 971 1106 1219 1045 917 1187 1172 1203 

(in kgms) =8820 

Calculate the m. s. for all the unconfounded effects and test 
their significance ? When the error m. s. is 1175 3304 
Ans. Effect : N P NP K NK PK 

M. S. : 1430 0833 4370'0833 5056 3333 39675 2408-3383 147-00 

Q. (4) : Analyse the following 2 s partial confounding 
experiment— •• 


Rep. I Rep. II Rep. Ill 

• By i?4 Ug 


a be 

b 


a 

be 


ab 

(1) , 

114 

62 


93 

90 


72 

75 

ab 

be 


b 

ab 


be 

abc 

92 

82 


84 

90 


83 

60 

(1) 

ac 


c 

ac 


c 

ac 

83 

92 

1 

87 

75 


83 

75 

c 

a 

1 

abc 

O) 


i a 

b 

73 

52 l 

71 

1 88 


IlOO 

| 92 


AB confounded ABC confounded AC confounded 

Ans. 

Source : Block, A, B, AB, C, AC, BC, ABC Error 
m. s. 19-03. 0-67, 10-67, 68 06, 0 17, 6 25 8‘17, 20 25, 284*75 
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q. (5) - Analyse 

experiment— 

Rep. I 



N-confounded 

Ans. 

Source : Blocks, 

M- S. : " 


the following 2® 

Rep. II 
B s B* 


(1) 

P 

30 

10 

n 

np 

26 

10 

P-confounded 

N, 

P 


6-125, 480-50 


partial confounding 


Rep. Ill 
B s B t 



NP-confounded 


]Njp ( Error 
0 125, 120833 






CHAPTER IX 


Split Plot Design 


Let us suppose that there are two factors A and B with m and 
n levels respectively and the factor A cannot be tested on small 
amount of experimental material but requires a large bulk while the 
factor B can be applied to much smaller amount. In testing these 
factors in the same experiment, the simple factorial scheme has to 
be mocified in such a way that the levels of A be assigned to larger 
plots (units), called Main plots and that ofjBto sub-divisions of 
the main plots, called Sub-plots. Following are the examples of 
factrrof which require large experimental units in agricultura 
research— 

(i) In field experimentation, the factors like sowing date and 
irrigation cannot te applied to smaller plots while the factors like 
varieties and manuring can be very conveniently applied to the 
smaller plots. 

(ii) In research on milking machine a relatively large amount of 
milk is requiied while the methods of cooling and pasteurizing require 
smaller amounts of milk. 

(iii) In green house studies, the entire green house is used as a 
main plot and several treatments conducted in a green house are 

sed as sub-plots. 

In a S. P. D., the size of the main plot is considerably larger 
than that of a sub-plot. Hence the precision of the main plot-factor 
will be much smaller than that of the sub-plot factor B and 
interaction AB but the average precision is the same as in the case of 
simple factorial experiment cairied out on the same bulk of the 
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experimental material and with the same no. of replications. It 
proves that the increased precision on B and AB is obtained at the 
cost of the sacrifice of precision on A. Thus the 5. P. D. can be 
regarded as a factorial experiment with main effect A confounded. 

Advantages : 

(i) The main advantage of S P. D. is that two dis-similar 
factors as regards the necessity of the experimental material are tested 
in one and the same experiment. 

(ii) Increased precision is obtained on the sub-plot factor ai d 
the interaction between the main plot factor and the sub-plot factor. 

(iii) The inclusion of an extra factor is possible by dividii g each 
ultimate plot into a no. of fuilher divisions. 

(iv) It saves experimental material in a no. of cases where a 
wide border is required between the main plots only. 

Disadvantages : 

(i) The main plot factor is measured with less precision. 

(ii) The Statistical analysis is complex and especially when 
some units are missing in the data. 

Applications — Keeping in view the above mentioned advantages 
and dis-advantages, we conclde that S. P. D. is appropriate for the 
following situations— 

(i) when all the factors are not of equal importance. 

(ii) when one factor cannot be tested on small amounts while 
the other can be tested. 

Randomization (when main plots arranged in a R B D ) — 

For the purpose of randomization, the experimental area is 
divided into as many blocks as the no. of replicates and then each 
block is divided into as many divisons (main plots) as the no. of 
levels of the main plot factor. The levels of the main plot factor 
are randomly assinged to these divisions in each block reparately. 
Finally, each main plot is sub-divided into as many sub-divisiors 
(sub-plots) as the no. of levels of the sub-plot factor. The levels of 
the sub-plot factor are randomly alloted to the sub-plots within each 
main plot separately. 

Statistical analysis (.when main plots are arranged in a R.B D. )- 

First we compute the C- F. and the total sum of squares in the 
usual way and the later is 'denoted by T. S. S,. In the next step, we 
compute the sum of squares due to the main factor A, blocks and the 
error against which the effects of A and blocks are tested. This error 
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is denoted by error (a). The computations are made by arranging 
the data in the following tabular way— 


\main factdr 


\ A 
Blocks \ 

<>0 

Ol 

.... Qm— l 

Totals 

*i 

(a»Bi) 

(Mi) 

.... (tf 

to) 

b 3 

• 

(a„B 2 ) 



to) 

k 

(a 0 B r ) 

(°i B r ) 

( a m—\ &r) 

(Br) 

Totals 

(«<) 

(Or) 

.... (flfn— i) 

G 


Where the symbol (a { ) -+ denotes the total yield due to ( th level 

of A and is the sum of ‘nr’ sub- plot yields. (r=0, 1, 2, m— 1) 

(B )) -^denotes the total yield of the ,th block and is a sum of 

‘ttm’ sub-plot yields. (_/=), 2 r) 

(aiBj )-* denotes the total yield due to ,th level of A in the ,th 

block and is a sum of ( n sub-plot yields. 

now, T. S. S 2 F, where C. F.=^- and 

N=rnw 


s. s M)- J , ^-c. r„ 

S. S (block!) ^-c. F„ 

and S. S. due to error ( a)=T . S. S 2 — 5. S. due to (A -(-blocks) 

Finally, S. S. ( AB ) and error (b) ate calculated from the 
following table — 


Levels of B 
Totals 


r 

1 K 

a o 

Levels of A 

• a m— 1 

Totals 

( a obo) 

(*A) 

> ( a m- l^o) 

(M 

1 *1 

(Mi) 

(*A) 


(bo) 

1 : 

1 ^n-l 

W»-i) 

(M«-0 

■ (^m— l^n— l) 

(*-,) 

l 

(a o) 

(<*i) 

• l) 

G 


Using S. S. (B) =2 F> K=zQ> } 

A rm 

T. S. S 3 -C. F., 

5. S. ( AB)=T . S. S 3 -S. S. due to (A+B) 

Error (b)=T. S. Si—T. S. S 3 — T. S. S 3 +S. S. {A) 


n—l. 



Error (b) m(n— l)(r — 1) S. S . ( E b ) S . 5. (Eb/m(n—l)(r— 1) 
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Now we arrive at the following A. 
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S. E — 

The S. of the treatment-means are given below — 


(1) S. E. of the difference between two A means = J ■ 


2V , 


rn 

2V“ 


rm 


(2) S. E. of the difference between two B means = J _ 

(3) S. E. of th e difference between two B means at the same 

/ 2V 

level of A = J §* and 

y r 

(4) S. E. of the difference between two A means at the same or 
different levels of £= y' 2 [ ( n-l ) V p + VTJ/Vn " 

^0 P'a 

In this case, the value of *t* against which the ratio 

is to be compared, is given by 

{m— 1) V p r«(for error b)+ V F t (for error a) 

/«= El ±L f . In 

(m— 1) V £ 6 + 


practice it is rarely computed. 

S. P. D. (main plots arranged in R. B. D.) versus factorial 
experiment (arranged in R. B. D.) 

S.No| 5. P. D. | Factorial experiment 


1 


2 

3 


4 


The sub-plot factor and 
the interaction are measured 
more precisely than the main 
plot factor but the average 
precision is the same as in the 
case of factorial experiment. 

It is used when all the 
factors are not of equal impor- 
tance. 

The size of the plots is 
accoiding to the necessity of 
the factors and hence the fac- 
tors which require larger bulk 
of experimental material can 
be tested. 

Inclusion of an extra factor 
is possible without disturbing 
the origin al layout. 


All the factors are measured 
with equal precision. 


It is used when all the 
factors are of equal impor- 
tance. 

The size of the plots remains 
the same for all the factors 
and hence the factors which 
require relatively large experi- 
mental material cannot be 
tested. 

Inclusion of an extra factor 
is not possible in the pre- 
planned layout. 
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Exp. No. (1)— A Split plot experiment was laid out in four 
replications to study the effect of sowing dates D lt D v D a , 4 and 
three depths of sowing 4» d it 4 on turmeric. The following yields in 
e: lbs. per plot was obtained— 

V 



Rep. I 


Rep 

1 . II 


Rep. 

III 


Rep, 

,IV 


’4 

7-7 

1 

r„, 

22 i 


/4 

8-3 

1 

'4 

4*2 

4 

4 

5.2 

4 

K 

190 

4 

4 

5-4 ' 

4 

4 

7*3 


.4 

2-6 


14 

149 


A 

8-3 


.4 

7*5 


’d \ 

0'9 

I 

’4 

63 


fd 3 

7-3 

1 

’4 

169 

4 

4 

2 T 

4 

4 

39 

4 

1 

14 

| 

7 8 

4 

4 

100 


A 

0-4 


*4 

16 


14 

6*1 


.4 

9-5 


A 

4-2 

1 

f4 

i 

5-3 


r 4 

13 


4 

2-6 

4 

4 

1-3 

^3 j 

i 

14 

63 

4 

| 

14 

1-0 

4 

4 

4*5 


w4 

1-4 


U 

4'3 


| 

14 

1-3 


.4 

4*0 


’4 

6.8 

1 

f 4 

4-9 


'4 

5*5 

I 

'4 

| 

8*8 

4 

4 

69 

, 1 
& 2 

1 

1 4 

40 

4 

• 4 

2-6 

4 

i 

1 4 

i 

6*1 


A 

5-3 


u 

5*1 


_d2 

2T 

1 

1 

l»3 

5*9 


(a) Analyse the data carefully ? 

(b) Give summary tables and work out standard errors and 
critical differences for different comparisons ? 

(c) Give a statement of conclusions ? 


(M. A. Patna, 1953) 



Split Plot Design 


173 


Solution — . 

Ho : The data is homogeneous with respect to the blocks, 
Sowing dates and the depth of sowing and the two factors 'are 
independent. 

T. S. Si=IXy 2 n-C. F.=(7'7) 2 +(5’2) 2 +-+(5-9)*-1713-63 

> J 

= 2682*46— 1713 83 
=968*83 


To compute the S. S. for blocks (raplications), factor '£>’ and 
the error (a), we prepare the following table— 
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g g EX y^ii p 

91 

''N 


where C. F. =H^^= 17 1 3 ' 6 3 end, y„ is the yield 


48 


of 3 sub plots. 


_ (19)H...+(11»- _ m363> 


7676 72 


-171 3-63=2558-9067 -,-1713-63 


S. S. (Z))=^ JJ-C. p- = (1 32 6) 2 +-+(2 9 j) 2 _ _ 1713 . 

17582*76+. ..+89401 


=845-2767 
63 


12 


1713-63 


=2185-4850-1713-63=471-8550 
5. S. (rep.) g-C. P. =^±^±^-1713 63 


J 12 


12 

200704+... +7621-29 
12 


— 1713*63 


= 1868-5516- 1713-63= 154-9216 
Error (a) = T. S. S 2 — S. S. due to (Dates+ replications) 

= 845 '2767 - (471 -8550+ 1 54921 6) 

=218*5001 

(iii) S. E. of difference between two *irf’ means at the same level 


r 

=J : - 


_ / 2x3^581 = y 162S jj )5 = 1-2763 


or 


(C. D.) =(S. E. of difference) x t (24) 

5% '05 

=1-2763 X 2'064= 2*634283»2-6343 lbs/plot 

(iv) S. E. of the difference between two *Z)’ means at the same 
different levels of ‘d’= V + J/ a ]/ r " 
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= 2*2654 

(C. D.) —(S. E. of difference) xt 
5% *05 


where t = 
•05 



0 m-\)V £ + 


3 x 3-2581 x 2-064 +24-2778 x 2-262 
3 x 3-2581+24-2778 


75 0905 
34 0521 


=2-205 


(C. D.) =2-2654 x 2-205=4-9952 lbs/plot. 
5% 


The treatment means are given in the following table— 


\ D 
\ 

d \ 

D t 

i 

D, 

A, 

7 

a 4 

Average 

'd' 

4 

13-4 

5-725 

5 45 

3-25 

6-956 

| 

10 375 

4-05 

5-725 

1-825 

5-494 

d 3 

9-375 

49 

5-225 

2-4 

5-475 

Average 

‘D' 

1105 | 

4-892 

5-467 

2-492 
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(i) Effect of D (Sowing Dates) : 

The maximum yield has been recorded in the case of D x followed 
by D t and their difference is significant. The minimum yield Is 
obtained in the case of D x which do not differ significantly from D t 
and D 3 . The sowiDg date D x is the best of all. 

(ii) Effect of d (depths of sowing) : 

The maximum yield has been recorded in the case of d x followed 
by d t and their difference is significant. The depths d 2 and d 3 has 
given almost the same average yield. Thus d x is the best depth of 
sowing. 

(iii) Effect of ‘dD’ : 

From the analysis of variance table, we see that the effect of 
‘Dd’ is not sigr ificant. Thus the two factors are independent. 

Conclusion— In order to get the maximum yield, it is recommen- 
ded that the turmeric should be sown at the date ‘D x and depth 'd x 
i. e. 'd x D x is the optimum combination. 


For calculating the S. S. due factor ‘d\ the itneraction ( dD’ and 
the error ( b\ we perform the following table— 


\ Dates 
\ 

depths \ 

D, 


_ Totals 1 

D < =d 

I (Totals 

=d)* 

d x , 

53-6 

(2872-96) 



12387-69 

A 1 41-5 

1(1722-25) 

16-2 | 22-9 
(262-44) |(524-41) 

7-3 | 87-9 
(53-29) |(2562-39) 

17726-41 

i| 

a 1 37*5 | 
1(1406-25)' 

19 6 / 20-9 
(384-16) (436-81) 

9-6 | 87-6 
(92-16) (2319-38)1 

17673-76 

! 

Total s=r /(600r4 6 ) 

1 58-7 | 65-6 l 29-9 (286-8-GI 

1(1171 -01)1(1436*46)1(314-45) |(8923 38)| 
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T. S. $3 =^ -C. F., where C.F.=~= 1713*63 and y u - 

• • 

is the yield of 4 plots 

• _( 53 l6 ) , + ... +(9 . 6) 2 


-1713-63 


8923-38 


1713-63 

=2230-8450-1713-63=517-2150 
S. S (Z))=^ C. Z 7 . =47 1 '8550 (from 1st table) 


5. S. W-fp'-C. F. 


(lll-3) 2 +...(87-9) 2 +(87-6) 2 

16 

-1713-63 

12387-69+7726-41+7673-76 

16 

— 1713-63 


= 1736-7412 — 1713-63—23*1 112 
5. S. (dD)—T. S. S 3 -S. S. due to (Dates+ depths) 
=517-2150— (471-8550+23 1 1 12)=22 2488 
Error (6)=7\ S. S.-T. S. S 2 - T. S. S. 3 +S. S. (D) 
=968*83 — 845"2767— 517-2150+471-8550 
=78 1933 


Now we arrive at the following A. V. T.— f 


Source of 
variation 

D. F. 

S. s. 

M. S. S. 

F cal. 

Replications 

3 

154-9216 

151-64053 

2-13 

D 

3 

471-8550 

[157-2850 

6-48* 

Error (a) 

9 

218-5001 

24-2778 

l=V E 


d 

2 

23-1112 

11-5556 

"3-55* 

dD 

6 

22-2488 

3-7081 

1-14 

Error (b) 

24 

78-1933 

3-2581 
= V E b 

' 


Totals 47 968-83 


F at 

5% I 1%~ 
3lt6 6-99 

9J 99 


3-40 56 i 

2-51 3-67 





178 


The Experimental Designs 


(a) From the analysis of variance table, we conclude that the 
sowing d?tes (D) and the depths of sowing (d) both differ significantly 
at 5% level of significance and the 2 factors (sowing dates and depths 
of sowing) are independent. 

(b) (i) S. E. of the difference between two ‘D’ means 



2 X 24 '2778 
4x3 


\/4'0463 =2-0115 

r 


(C.D.) -(S. E. of difference) X / (9) 

5% -05 

— 2’01 15 X 2-262=4-5500 1 3s4'55 Ibs/plot. 


(ii) S. 


E. of the difference between two ‘ d ' means-- 


J 



rm 


= 0 6382 

(C. D.) =(S. E. of difference) xt (24) 

5% -05 

=0-6382 x 2-064=l-3172448sl-3173 Ibs/plot. 
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EXERCISE IX 

# 

Q. (1) Describe the Split plot design and its advantages ? , 

The response of wheat to 4 types of composts is to be tested at 
3 intensities of irrigation. State how you will layout the experiment 
m a S. P. D , and give a skelton analysis of variance of results ? 

(M. Sc. Ag. Agra, 1956) 

Q. (2) What is split plot design ? 

An experiment is to be conducted on wheat with 5 dates of 
sowing and 3 varieties. Suggeste a suitable design for the experi- 
ment ? Set up the analysis of variance and indicate how you would 
get the sum sf squares for the different components in the alalysis of 
variance table ? 

(M. Sc. Ag. Agra, 1960) 

Q. (3) fa) Describe randomized blocks and Split-plot designs 

in field trials and compare the two for their relative merits and 
demerits 7 

(b) In a S. P. D ., there were 3 main plots with 3 varieties of 
paddy. The main plots were replicated 6 times. Each of the 18 main 
plots was Split in 4 Sub-plots on which there were 4 dates of top 
dressing of a fertilzer. Construct the analysis of variance table 
showing the sources of variation and corresponding degrees of 
freedom ? (M. Sc. Ag. Agra, 1964) 

Q. (4) An experiment is to be conducted for finding the effect 
of the three levels of irrigation on the yield of P(5ur varieties of potatoes. 
Suggest a suitable plan for this experiment and give the skelton of 
analysis of variance table ? Also indicate the method for calculating 
the sum of squares for the different components ? 

(M. Sc. Ag. Agra, 1961) 

Q. (5) An experiment was couducted for finding the effect of 
the 4 levels of green manuring on the yield of 3 varieties of potato 
and manure was applied to sub-plots while the varieties were sown in 
the main-plots. The following results were recorded— 

Treatment: v lgl v,g 2 v lg3 v,g 4 v 2 g, v 2 g 2 v 2 g 3 v 2 g 4 

Total yield: 429 538 665 711 480 591 688 749 

of 6 plots v 3 gi v 3 g 2 v 3 g 3 v 8 g 4 

(in seers) 520 651 703 761 
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14 

Source 

D. F. 

S.S 

ex 




P 1 

Replications 

— 

1 5875-278 

cc 




re 

Varieties 

— 

— 

A 

Error (a) 

- 

6013-30 

t 

f 

Meanuring 

• • • • • • • 

• •• • • • • • • 


Interaction 

— 

— 


Error (b) 

— 

- 


Totals 

71 

1 

1 51985-944 


Carryout the further analysis and state your conclusions ? 

Ans- Blocks and manuring diffei 
significantly. 



CHAPTER X 


Swifch-over Trials 

Or 

Cross Over Dt signs 


The experimental Designs discussed so far pertain to situations 
in which the treatment applied to a particular experimental unit 
remains the same for the whole duration (period) of the experiment. 
The designs discussed in the present chapter pertain to situations in 
which the treatment applied to a particular experimental unit does 
not remain the same for the whole duration of the experiment. But 
the whole duration is divided into as many fractional periods as the 
no. of treatments and the treatments are assigned randamly to these 
fractional periods. Since the treatments are switched in sequence 
over several 'fractional periods, hence the design is called switch over 
design. 

Applications : This design is used in dairy husbandry, 
biological psychological, and marketing research. 

Advantages: (1 ) This design is very useful for the situations 
where the effect of the treatment varies with the time and the whole 
period can be divided into different fractional periods according to 
these effects. 

(2) This design estimates the treatment effect over a short 
period of time and controls the fluctuations due to the time. 
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Randomization : Since all the treatments are applied to each 
experimental unit, hence each unit is considered as a full repjicate. 
For the purpose of randomization, the Whole duration of the 
experiment is divided into as many fractional-periods as the no. of 
treatments and the treatments are randomly alloted to these fractional 
periods within each replicate with the restriction that each treatment 
is to be given an equal no. of times in each fractional period. It 
requires that the no. of replicates must be an exact multiple of the 
no. of treatments. The treatments are arranged in a R. B. D. or 
L. S. D. 

Following is the plan for two treatments (A and B) and 8 
replicates in a R. B D.— 


\ Replicate 1 

\ ■ 
Period \ 1 

2 

3 4 

5 6 

7 8 

I B 

A 

B A 

J 

A B 

B A 

11 A 

B 

A B 

B A 

A B 

* 






If the experiment has to be conducted in a L. S. D, the plan 
will be of the following foim — 

Square l Square 11 

A | B 

B | A 

i 



Square HI 


Square IV 
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Statistical Analysis : For a switch-over design arranged in a 
R R, D. with *n' replicates and ‘k’ treatments, the break down of the 
d f. is as follows— v % 


Source of variation 1 

1 

i 

| D. F. 

Replicate 

n— 1 

Period 

k— 1 

Treatment 

k— I 

Error 

(n — 2)(k — 1 > 

i 

Totals 

1 

i nk— I 


The computations of the sum of squares are shown in the 
following example 

Exp. (1) . Two rations A and B were administered to 8 dairy- 
cows. Each cow received ration A and B in period l (first half of 
the lactation period) and period II (second half of the lactation 
period). The rations A and B were lloted to the two periods at randon 
with the restriction that half of the cows received tlje ration A and the 
other half the ration B in each period. The experimental design 
and the milk yield (in seers) re given in the following table— 


\Cows 









\ 

1 

2 

3 

4 

5 

6 

7 

8 

\ 








Periods 









I 

B 

i a i 

B 

| A 

B I 

1 B | 

A 1 

A 

25 

1 23 

15 | 

| 20 

15 

1 15 | 

14 | 

20 

II 

A 

1 B 

A | 

B 

A 

A 

B 

B 

15 

1 11 

15 ' 

15 | 

15 

12 

6 

14 


Analyse the data and interpret the results ? 
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r f -° 2 100 -AO S 

C F -N-= \6~ 625 

2 8 * * 

T. S. S.-ZZy^j-C. F.= 322—6-25=315*75 
i j 

C 2 , 

S. S due to cows C. F. 

j 2 

100+16+.. . + 16 
— 2 ^ 

= 133— 6-25=126-75 

S. S due to periods -C. — + 289 -6-25 

: o o 


= 12725— 625^ 121*00 


T 2 _I_ T2 

5. 5. due to Rations= A B 


196+16 
-C. F- 8~ 


-6-25 


-26-5— 6-25=20-25 

S. S due to error - T. S. S. —S. S. due to (cows+periods+ 

rations) 


= 315-75-(12ri-75+ 121-00+20-25) 
=47-75 

Now we arrive at the following A. V. T.— 


Source of 
variation 

D. F. 

S. T. 

M. S. S. 

' F cal. 

F at 

i* 



1 

5% 1 

1% 

Cows 

7 1 

126 75 

18-1071 

2-275 

4-21 

8-26 

Periods 

1 

12100 

12100 

15-204** 

5-99 

13-74 

Rations 

1 ! 

20-25 

20-25 

2-545 

»* 


Rrror 

6 

47-75 

0-9583 



- 

Totals 

15 | 

315-75 

1 - 1 

“ 1 

1 - 1 

- 


Inference : The two rations A and B do not differ significantly 
while th“ two periods differ significantly at 1% level. 


If necessary, the S. 
treatment- means is given by 


E. of the difference between the two 


S. E. 


of difference = 







1,86 


The Experimental Designs 


EXERCISE X 
Q. (I) : What is switch over trial ? 

Give in detail the plan for feeding trial conduct with the object 
of testing the effect of three different rations on the milk- yield of 
Hari ma cows. Indicate briefly the method of analysis of the data ? 

(M. Sc. Ag. Agia, 1961) 


Hint : Let the three rations be A, B a^d C. The whole 
lactation period will be divided into 3 fractional periods I, II and III. 
In this case, the no. of replications may be 6 or 9 or 12 etc. the 
exact multiple of 3. Suppose we start with 6 replications, then the 
plan will be of the followin formg— 


\Cows 

\ 

\ 

Periods 


I 

II 

III 


l 


A 

C 

B 


2 3 4 5 6 


A B C C B 
B C B A A 
C A A B C 


The break down of d. f. will be as follows— 


Source of variation D. F. 


Cows 5 

Periods 2 

Rations 2 

Error 8 


17 


Tatals 



CHAPTER XI 


Progeny Row Trials 

And 

Compact Family Block Designs 


The selection of plants for further propagation is of paramount 
importance in plant breeding work. In old days, the method of mass 
selection was in use for the selection of such plants. The method 
consists in choosing from the material under selection, a no. of plants 
that appear to be superior in respect of the character or charcters 
under selection, bulking the seed from these selected plants, raising 
from the seed a next generation and again choosing the superior 
plants from this generation and repeating the same operation as 
before. The main drawback of this method was that in each generation 
the selection is subject to environmental or non-genetic variability 
present in the field. Due to this drawback, this method was inefficient 
and defective. 

Aftet the development of the basic principles of experimental 
designs, the necessity of the objective testing by the application of 
randomization and replication was realized. The replicated experiments 
require a comparatively large amount of seed and the experimental- 
material. But the amount of seed produced by the selected plants 
remains small which presented a difficulty in the application of 
randomization and replication. Another difficulty in conducting the 
replicated experiments was that the genetic variation (due to the 
heterogeneity of the plant breeding material) enters the variation due 
to error. These two difficulties are successfully overcome by adopting 
the method of ‘ Progeny Row Trials ' . This method is statistically 
sound being based on the principles of randomization and replication 
and with extremely small plot-size. Tht present method consists in 
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sowing the seeds of different selected plants in separate rows and 
selecting that plant which seems to be superior on the average., In 
this method, the average performances of progenies are taken into 
account while the method of mass selection depends upon the 
performance of the individual plants. It is a well known fact that 
the mean is subject to a fraction of the environmental variation to 
which the individual plants are subjected. Hence this method is more 
reliable. 

The progeny row trial is a randomized block design (R. B. D.) 
in which each plot consists of 3 rows. All the seeds of a plot belong 
to the same parent plant (progeny). The middle row is called ihe 
progeny or experimental row and the two side rows are called the 
guard rows. The guard rows avoid the border-effect on the 
experimental row. 

Statistical Analysis— Suppose we have * p ' progenies and ‘r* 
replications. Then the break down of the d.f. will be as follows— 


Source of variation 

D. F. 

1 

Replication 

| (r-1) 

Progeny 

(P - !) 

Error 

(r-l)(p-l) 

Total 

rp-l 

The computations of the sum 

of squares due to different sources 


and test of significance are made exactly in the same way as in the case 
of R. B. D. 

It R-test indicates that the different progenies differ significantly, 
they are arranged In the descending order of their magnitudes. 
Tbe progeny with maximum mean is selected for further propagation 
provided it differs significantly from the remaining progenies. In the 
case, when the maximum mean value does not differ significantly 
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from one or more mean values, the progeny of mean with greatest 
plant error (the variance between the plants of the same progeny) is 
selected from thofe which do not differ significantly 

Compact Family Block Design— In Progeny row trial, all the 
progenies to be compared belong to the same family. However, if the 
progenies to be tried belong to the different families then compact 
family block design which is analogous to the Split Plot Design 
(S. P. D.) is used to compare— 

(1) Different families, 

(2) Progenies belonging to the same family, and 

(3) Progenies belonging to the different families. 

Randomization — In this design, the families are randomized in 

the main plots and the progenies of the same family are randomly 
alloted to the progeny sub-plots within the main plots. 

Statistical Analysis — Suppose we have */’ (F lf F t Fj. ) 

families, ‘p' progenies in each family and V replications. Then the 
above mentioned 3 comparisons are made in the following way — 


(1 ) Comparison between families-To have a comparison between 
the families, the main-plot data are analysed. The breack down of 
d.f. is as follows— 


Source of variation 

D. F. 

Replications 

/•— 1 

Families 

/-I 

• 

Error 

(r-l)(/-l) 

Totals ^ 

i 

i 

r/-l 


The computations of sum of squares and the test of significance 
are made exactly in the same way as the main plot data is analysed 
ih the case of Split Plot Design. 

The S. E. of the difference between the 2 family means 



x Error variance between families 
rp 








190 


The Experimental Designs 

(2) Comparison between progenies within families— For this 
purpose, the various families are analysed separately and tested 
against the arror obtained from the data of thftt family. The results 
are arranged in the following tabular form - 


Source of 
variation | 

D. F. 


i 

mi 

M 


Fat 




|5% l 
1 

1% 

Replications 

(r~ 1) 


MB 

■i 

■ 


e 

1 

b 

1 











Progenies 

(P-1) 










Error 

(r-l)(p-l) 




... 1 




— 

— - 

— 

Totals 

(rp- 1) 

1 

1 1 


— 

!- 1 

, _ ' 

i 

— 


The S. E. of the difference between the two progeny means 
within the same family 



2 X Error variance between the progenies within the same family 

rf 


(3) Comparison between progenies belonging to different 
families — If the within family error variances are homogeneous then 
the different families and their progenies are compared with the help 
of the following pooled analysis of variance table — 


Source of 
Variation 

D. F. 

S. S. 

M. S. S. 

F. cal. 

• Fat 

5% |1% 

Replications 

(r~ 1) 

... 


... 

... 

... 

Families 

(/-l) 

... 


... 

••• 

... 

Error (a) 

(r-l)(/-l) 

... 

V E a 


— 


Progenies within 

7(p-i) 

• • • 


. • . 

. . . 

. • . 

families 







Error (b) 

1 

/(p-l)(r-l) 

... 

V B, 

— 



Totals | 

frp — 1 \T.S S. | 

~ | 

- 1 

- 1 

— 
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S.S. due to progenies within families and error (b) are computed 
by adding the corresponding sum of squares obtained for different 
families. . 


The . S’. E of the the difference between the two progeny means 
belonging to different families 


J1 j ^+0-0^ 1 


Advantages of Compact family Block Designs— 

(1) The main advantage of this design lies in the fact that the 
progenies of a family are shown side by side in a family main plot. 
Hence the progenies of a family experience the same type of environ- 
ment, and variation within families is minimum. 

(2) When the analysis of variance indicates that certain families 
are inferior to others, further analysis of these families is not essential. 
This saves a considerable amount of time and labour involved in the 
analysis when a large number of families is tried. 
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EXERCISE XI 


Q. (1) Describe the compact family blqck^ design for plat 
breeding trials and discuss its advantages. Give the skelton of analysis 
of variance of such a trial ? 

(M. Sc. Ag. Agra, 1958) 

Q. (2) Write short notes on- 

(i) Replicated progeny row trials, 

(M. Sc. Ag. Agra, (1956) 

(ii) Compact family block designs, 

(M. Sc. Ag. Agra, 1960, 64) 



CHATER XII 


Rotational Experiments 


Certain agronomic experiments are conducted on the same 
experimental site for a number of years to compare the several crop 
rotations or sequence of agronomic practices. These experiments are 
called the Rotational Experiments. 

For illustration of the procedure of randomization and statistical 
analysis, let the different rotations to be campared be— 

R x (Lubia— wheat) 

R 2 (Moong— wheat) 

R 3 (Dhancha for green manuring— Wheat) 


j R y (no cropping since April to October— wheat) and the 

experiment be repeated for ‘n* years in V blocks. 

Randomization— The experimental site will be divided into V 
blocks and each block will be divided into ‘v, plots. Then V rotations 
will be randomized in each block separately. The experiment will 
continue for ‘n’ years. 


Statistical Analysis— The data for each year will be analysed 
separately. The breakdown of the degrees of freedom for the data of 
‘w’ years is as follows— 


Source of variation 

1 D. F. 

Blocks 

(r-0 


Rotations 

(v— 1) 


Error 

(r-l)(v-l) 


Total 

| rv— 1 
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There will be ( rt' analysis of variance tables, one for each year. 
If the error variances are homogeneous then the ‘pooled data can be 
analysed. The breakdown of the d. f. will be as follows— 


Source of variation 

D. F. 

t 

Blocks 

n(r- 11 

Years 

n— 1 

Rotations 

v— 1 

Interaction (RxY) 

(v— 1)(«— 1) 

Error 

— 1 )(v — 1 ) 

Total 

1 

nrv— 1 


The sum of squares due to blocks and error will be obtained by 
adding the corresponding S. S. obtained from the individual analysis 
of variance tables and the S. S. due to rotations, years and their 
interaction will be obtained by arranging the data in the following 
vx« table— 


Wear 

\ 

Rotation \ 

Yx 

Y i 


Y n 

Totals 


(RxYi) 

(*iY a ) 

• • • • * • • 

(*xY n ) 

... 

x* 

(RJx) 

(W 

! i 

W») 

Ml 



(KYJ 

i 

• 

• 

• •• • • • • • • 

(R*Y„) 

III 

Totals 

• • • 


• 

• 

• 

• • • • • • 

. G 
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* The rest procedure is the same as in the case of (v xn) factorial 
experiments. 

If the rotations differ significantly, then the S. E. and C. D. 
will be computed for selecting the best rotation. 

The S. E, of the difference between the two rotation means 



Advantages— (1) These experiments are conducted to select 
the best rotation for a given locality. 

(2) The rotational experiments are also used to compare the 
agronomic practices on a fixed rotation of crops. 

EXERCISE XII 

Q. (1) Describe in short “Simple Rotational Experiments" 
and give the procedure of randomization and statistical analysis by 
considering a suitable example ? Also mention the object of these 
experiments ? 




Part III 

Official Agricultural Statistics 




Chapter I 

Official Agricultural Statistics 

Definition : Official Agricultural Statistics is defined as the 
aggregate of quantitative information bearing on the different 
fields of agriculture and its economy. 

Classification : It covers a very wide field which can be 
classified as follows : — 

(i) Land Utilization Statistics, 

(ii) Agricultural Production Statistics including Live-Stock 
and Fisheries, 

(iii) Agricultural Price and Wage Statistics, 
and (iv) Ancillary Agricultural Statistics. 

Importance : In a country like India which is predominantly 
an agricultural country, the collection of agricultural statistics is 
of paramount importance. Since these statistics are directly 
connected with the rural economy and formulation of the food 
policy which is of utmost importance to the whole country. Most 
of the original agricultural statistics of our country is collected in 
connection with the area and yield of different crops and crop- 
forecasting. They are very helpful to the government in the 
formulation of agricultural development plans and food policies, 
measurement of the effect of the past development policies and 
collection of land revenue. They are also useful to the traders 
and general public as they help in stablizing the prices of 
agricultural commodities. The collection of live stock and fisheries 
statistics is of equal importance as they solve a good deal of food 
problem. Keeping all these facts in view, we conclude that the 
collection of statistics regarding the crop-acreage ard production, 
live stock and their products and fisheries is of primary importance. 

Agency for Collection : In India, the importance of agricultural 
statistics has all through been realized. The statistics pertaining 
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to the agriculture are some of the earlier statistics that we have. 
Kautilya’s Arthashastra as also Aine Akbari' provide a lot of 
informations regarding the population, acreage of crops and prices 
of commodities etc. The modern history dates back to the year 
1875 when the department of Agriculture and commerce was set 
up in Uttar Pradesh and later on in 1881 similar departments were 
opened in other provinces as a result of the recommendations of 
the Famine Commission of 1880. After the First World War, 
certain improvements were made in the system of collection of 
agricultural statistics. The coming of independence led further 
demand for the co-ordination of the statistical material relating 
to agriculture. At present, all work relating to the collection, 
compilation and publication of agricultural statistics is carried out 
by the directorate of Economics and Statistics in the Central 
Ministry of Food and Agriculture. The following are some of 
the important regular publications of this directorate — 

(i) Abstract of Agricultural Statistics of India, 

(ii) Indian Agricultural Statistics (Yol. I and II), 

(iii) Estimates of area and production of principal crops in 
India (Vol. I and II), 

(iv) Agricultural prices in India, 

(v) Agricultural wages in India, 

(vi) Bulletin of Food Statistics, 

and (vii) Indian agriculture in brief. . 

\ 

The research work for the improved methodology in the field 
of agricultural statistics is carried out by the statistics branch of 
Indian Council of Agricultural Research (I. C. A. R.). It works 
in close co-operation with the directorate. 

With a view to collect comprehensive information relating to 
all sections of national economy the directorate of National Sample 
Survey (N.S.S.) under the department of economic affairs ministry 
of finance was established in June, 1950. Since in 1953 the N.S.S. 
has taken over the work of large scale sample survey in the field 
of agricultural statistics which was previously conducted by 
I.C.A.R. In U. P.; the work relating to agricultural statistics is 
carried out by the directorate of agriculture. The districtwise 
information relating to land utilization, area and yield of principal 
crops and some other, usefhl informations are published annually 
in the bulletin of agricultural statistics for Uttar Pradesh. 
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EXERCISE No. (1) 

Q. No. (1) Write a short note on the importance of 

agricultural statistics for any country ? (M. Sc. Ag. Agra 1965) ' 

% 

Q. No. (2) Name six important official publications relating 
to agricultural statistics in India ? 

Q. No. (3) What are the agencies for the collection of 
agricultural statistics ? 



Chapter II 


Land Utilization'Statistics 

Introduction : The statistics relating to land-utilization are 
being collected since 1884. In U.P., districtwise detailed statistics 
of land utilization, area and production of crops, live-stock, 
agricultural prices and other useful informations with a description 
on the weather and crop conditions are published in “ season and 
crop report of U. P." It is an annual publication, published by 
the ", Board of Revenue U. P." 

Method of collection in temporarily settled areas : In the 

temporarily settled areas like U. P., detailed crop records on 
statistics are kept by the village accountant (called Lekhpal in 
U. P„ telathi in Maharashtra, karnam in South and karamchari in 
Bihar) and supervised by his immediate officer kanungo. For a 
better supervision and random checking, some state governments 
(U. P. is one of them) have appointed District Statistical Officers 
(D. St. O.). The statistics thus obtained are fairly reliable. 

Method of collection in permanently settled areas : In 
permanently settled areas like Bihar and 3engal, no detailed crop 
records are required to be kept for revenue purpose and this work 
is entrusted to the police chokidars of the respective villages or 
village head men. These statistics are merely the guess work and 
hence they are not reliable and accurate. 

Classification : Land utilization statistics are classified under 
the following heads — 

(1) total area and classification of area, 

(2) irrigation area, 
and (3) cropwise area. 

(1) (a) Total area : The geographical area is furnished by 
village returns prepared by Lekhpal. It is exclusive of corporation. 
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municipal and town areas. These areas are based on cadastral 
Survey carried out by the state government. 

(b) Classification of area : In 1949-50, the area was classified 
into the following nine classes — 

(i) Forest : This class includes all forested areas on the land 
or administered as forest under any legal enactment dealing with 
forests whether state owned or private. 

(ii) Barren and unculturable land : It stands for all barren 
and unculturable land like mountains, usar land etc. which cannot 
be brought under cultivation. 

(iii) Land put to non-agricultural uses : It gives the area 
covered with water, sites, roads, buildings, cremation ground etc. 
and all other lands put to the uses other than agricultural. 

(iv) Cultivable waste : This category covers the land available 
for cultivation but not taken up for cultivation or abandoned after 
a few years for one reason or the other. 

(v) Permanent pastures and other grazing lands : This heading 
includes all the grazing lands whether or not they are permanently 
pastures and meadows. 

(vi) Land under miscellaneous trees, groves, not included in 

area sown : It has all culturable land which is not included in the 
area sown but it is put to some agricultural use. Tanels under 
groves, forests of timber and fuel trees, shrubs, # bushes etc. which 
are not included under orchard are kept in this category. 

(vii) Current fallows : This category comprises cropped areas, 
which are kept fallow during the current year. 

(viii) Other fallow lands : It includes all the lands which are 
taken up for cultivation but are temporarily out of cultivation for a 
period of not more than five years. 

(ix) Net area sown : It has the net area sown with crops and 
orchards. 

(2) Irrigated area : The irrigated area is classified according 

to— 

(a) source of irrigation, 
and (b) crops irrigated. 
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The data under category (a) stands for the net area irrigated while 
that under (b) represents the gross irrigated ar^a. 

(3) Cropwise area : Fairly detailed informations are collected 
relating to acreage under important crops. The total cropped area 
is divided into— 

(a) area under food crops, 

and (b) area under non-food crops. 

The former group is further sub-divided into cereals, pulses, sugar- 
cane, coneliments and spices, fruits and vegetables including root- 
crops and others. The later is further classified into oil seeds, 
fibres, dyes and tanning materials, drugs and narcotics, fodder 
crops, green manure crops and others. 

Causes of inaccuracies : Following factors are responsible for 
inaccuracies in the collection of area statistics— 

(i) lack of training and efficiency of the Lekhpal, 

(ii) heavy work load and low salary of the Lekhpal, 

(iii) lack of supervision and random checking, 

(iv) errors committed in the compilation of figures, 

(v) use of mixed crops, (no definite formula can be given 
for the accurate estimation of the acreage under mixed 
crops as the composition of the crops changes very 
oftenly). 

(vi) use of fixed ridges; (the field ridges are also included 
in the estimate of area. These are neither sown nor 
cropped.) 

(vii) lack of definition of acreage under crop; (it is not 
certain whether acreage under crop means the area 
sown or the area successfully cropped.) 

(viii) method of collection adopted in permanently settled 
areas. 

Suggestions for the improvement of the statistical data : 
Following measures have been suggested for improvement of the 
statistical data. 

(i) The steps should be taken to train the lekhpals in the 
technique of dath-collection. 
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(ii) The jurisdiction of the lekhpal should be reduced ind 

he must be better paid. 

• • 

(iii) The supervisors should pay the surprise visits to verify 
the accuracy of their subordinate members. 

(iv) A certain % of the area should be considered as 
occupied by the ridges and the total acreage figures 
should be accordingly adjusted. 

(v) For the accurate estimation of acreage under mixed 
crops, the method of crop cutting experiments should be 
followed. 

(vi) In permanently settled areas, random sampling method 
should be adopted to obtain the reliable statistics. 
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EXERCISE No. 2 

Q. No. (1) Write an essay on Land-utilhcAm Statistics ? 

Q. No. (2) Name the classifications of Land-utilization 
Statistics and explain the defects of collecting the area statistics ? 

Also supply some suggestions for improvement in the collection 
of the area statistics ? 



Chapter 1,1} 

Method of Estimating Crop-Yield 


There are two methods of estimating the crop yield — 

(i) Annawari Estimation : According to this method, the yield 
of a crop in a year for a district is estimated by the following ' 
formula — 

District yieid (yd)— ^ [Area X Normal yieldxAnna condition 

of the crop] 

Area : The acreage under a crop is furnished by the village- 
papers prepared by leklipal and supervised by the supervisor 
kanungo. The area statistics thus liumshed are fairly accurate 
in this state (U. P.) 

Normal yield : It is the average yield on the average soil in a 
year of average character as deduced from a consideration of the 
informations obtained on the experiments made during the year under 
review. The state department of agriculture is fesponsible for the 
estimation of normal yield which is obtained on the basis of crop- 
cutting experiment. The method consists in selecting some average 
plots and sowing and harvesting the crop on these plots. 

Anna-condition of the crop : Taking the normal yield as 16 
annas, the condition of a crop in a particular year can be described 
in relation to the normal yield in terms of annas. For instance, 
if a crop seems to be £ of the normal crop then its yield is taken 
as 12 annas. The anna-condition of the crop is based merely on 
eye inspection of the crop reporter. 

This method has been subjected to the following criticisms— 

(1) The selection of the plots and estimation of anna- 
condition are not based on any objective-method but 
fully droend on the discretion of the experimater. 
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(2) The standard errors of the estimates cannot be 
measured. 

(3) The districts selected for crop-cutting experiments 
continue to be those which were approved some SO 
years back. 

(ii) Random Sampling Method : The method which is used 
these days, was suggested by P. V. Sukhatme / in Agricultural 
situation in India , November , 1954. 

This method contains the following main steps— 

(1) The district is stratified into administrative sub-divisions 
(Pargana or Block etc.). 

(2) We select a number of villages in proportion *o the 
area such that crops are randomly selected from each 
sub-division. 

(3) Two fields are selected at rannom from each of the 
selected villages. ~ 

(4) A plot of desired size and shape is located in each of 
the selected fields. 


On the basis of the yields of these plots, the estimates of the 
yiejd/acre and its Standard error are computed. If A< stands for 
the area of i th sub-division and y { (estimated from the sample data) 
for the average yield in the i th sub-division, then the average 
district-yield (yd) ks given by the formula— 


27A & 
27A 


Further, if S* v denotes the variance between the villages (in 
A. V. T.) and denotes the number of villages in the 
sub-divtsion then 


V(yd) 


r ^ xS#v 


This method is'more scientific and reliable as compared to the 
traditional ( Annawat ) method due to the following reasons— 

(1) The selection of the villages* the . fields,, within the 
selected villages apd the location . of the plots within 
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1 ' the selected fields is done at random and does not' 

depend * on the discretion (at any stage of selection) of 
the investigator. Thus the estimates obtained from the 
random sampling experiments are found as unbiased. 

(2) The estimates are based on the modern statistical 
tehnique of sampling and not on the guess of the 
investigator as in the previous method. Therefore, the 
results are very near to the true values. 

(3) The standard errors of the estimated values can be 
measured. 

, Yield-Estimation 

Size and Shape of the plot : The following table shows the 
lize and shape of the plots which are adopted in U. P. for 
different crops— 

Name of crop Shape Size 

(i) Sugarcane square 33'x33' 

(ii) Jute square 16’5'xl6’5' 

(iii) Cotton rectangle 66'x33' 

(iv) Others Equilateral triangle side 33' 

Location of the plot within the field : To locate the plot 

within the field, the South-West (S. W.) coi ner of the fild is chosen 
as the starting point and a peg is fixed there. The length and the 
width of the field are measured in terms of the steps from the 
starting point. Then two random numbers are selected such that 
1st of them does not exceed the difference of the number of steps 
along the length and that of 13 and second does not exceed the 
number obtained by subtracting 1 1 from the number of steps taken 
along the width. We may assume x and y two such nnmbers 
respectively along the length and the width. 

Now from the peg at the S. W. Comer, x steps are 

measured along the length and y steps along the width 
(perpendic ular to the length). A second peg is also fixed at this 
newly arrived (x, y) point. If a triangular plot is to be located 
within the field then one vertex of the equilateral triangular chain 
is kept fixed at the point (x, y) and the chain is stretched in such 
a way that one of its sides remains parallel to the length and away 
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from the starting point. A third peg is fixed At the second vertex 
of the chain and finally the third vertex of the Chain is continued 
to be moving along the direction of the width away from the 
starting point till the chain is fully stretcaed and then a fourth 
peg i<? fixed at the third vertex of the chain. The equilateral 
triangle with vertices marked by the second, third and fourth pegs 
will bi a desired plot. f 

Crop Fore-cast : For the purpose of administration and 
policy formulation, the knowlege of the acreages under important 
crops and their expected yields is very essential. All these 
informations with a description of the general conditions of the 
crops are published from time to time during the growth-period of 
the crops in the form of bulletins , one for each crop. These 
bulletins are known as the crop fore-casts. 

' i i 

Three , bulletins are issued for most of the crops. The first 
bulletin Jis issued after one month of sowing the crop which 
contains the informations regarding the area sown and the 
conditions of germination. The second bulletin is issued after 
3 months of sowing which gives an idea of the crop-condition 
and the anticipated yield of the crop. The third bulletin is issued 
(published) about a -month before the harvesting of the crop which 
gives the idea of the crop-yield to be harvested. 

As a matter ofTact, the number of fore-casts depends upon 
the importance of the crop. For instance, the number of fore-cast 
is one only in the case of Castor, Ginger and' Groundnut, two f<ft 
Jowar, Bajra, Maize and for Khar if and Rubi pulses. The .number 
of fore-costs is 5 for Rice and Wheat and it is 6 for Sugar-catte. 

The important crops for which the crop fore-casts are not 
made so far, are Tobacco and Potato but the plan for their 
fore-cast is under consideration. 

Live stock : At present a quinquennial live stock census 
including details on agricultural implements and machinery of ail 
types is held in every part of the country. The collected 
informations are published in India Live Stock Census. The last 
census was conducted in 1961 on ah improved basis. Uniform 
definitions were adopted by all the states, and live f stodk oertsus 
officers were appointed in ere&state. The provision wa&bftft ffiKdft 
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for rationalized supervision and for training to enumerators and 
supervisors. • * 

In 1961*, live stock was classified into two broad groups ; 

(i) Bovine ani(ii) Other. 

The bovines were classified as cattle and buffaloes and further 
classified according to the sex and then age. Other live stock 
includes sheep, goats, horses and ponies, mules, donkeys, camels 
and pigs. The sheep, goats, horses and ponies are classified 
according to the age and sex while the donkeys and pigs are 
classified according to age only. 

In the same year i.e. in 1961, the poultry was classified as : 

(i) Fowls (ii) Ducks and (iii) Others. 

The Fowls include hens, cocks and chickens and Ducks include 
ducks, ducklings and drakes. 

Fisheries Statistics 

The Fisheries Statistics are very inadequate and highly 
unorganised. The available data can be classified in the following 
four classes : 

( 1 ) Data available in market-reports. 

(2) Data available with Fisheries Research Institutes and 
stations and they are namely : 

(a) Central Inland Fisheries Research Station at Calcutta, 

(b) Central Marine Fisheries Research Station at Mandapam 
Camp, 

t c ) Deep Sea Fisheries Station at Bombay, 

(d) Offshore Stations at Tuticorin, Cochin and Vishakhapatam, 

(e) Central Fisheries Technological Research Station at 
Cochin and 

v f) Fisheries Extension Units. 

(3) Data available with Fisheries Development Adviser and in 
State Gazettes. 

{4 ) Data about consumption offish collected by N. S. S. 

Our Five Year Plan have the provision for the development of 
fisheries in India. The F. A. O. is also helping in the development- 
schemes. It is hoped that in future, the position of Fisheries 
TSttUistics will be much improved in India. 
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Shortcomings and Improvements in 
Agricultural Statistics 

The main defects of Indian Agricultural Statistics may be 
classified under the following heads : 

(i) Gaps in coverage ; 

(ii) Lack of uniiormity in definitions ; 

(iii) Defects of primary reporting agency ; 

(iv) Defects of tabulation and processing ; 

. (v) Defects of inspection, supervision and checking ; 

(vi) Defects of planning and co-ordination and 

(vii) Delay in publication. 

(1) Gaps in Coverage : Agricultural Statistics are not available 
for the whole geographical area of India. For several million 
acres in Rajasthan, Gujrat, Assam and Kashmir, figures are not 
available. Some of the areas are also not cadastrally surveyed 
and hence agricultural statistics, in these areas are not properly 
organised. • Further there are certain areas where reporting 
agencies do not exist and as such estimates for a number of crops 
are not available. The agricultural statistics of India can be said 
to be complete when estimates of acreage under different crops 
and land-use classes become known in respect of the whole 
geographical area of the country. 

During the last few years steps are taken to remedy the above 
defects. The reporting area has been increased ; reporting agencies 
have heed set up in the area where they did not exist ; and regular 
estimates are being published for several minor crops. 

(ii) Lack of uniformity in definitions : The methods of 
obtaining the area statistics are different in temporary areas and 
permanently settled areas. The position has improved to a 
considerable extent in Bihar and Bengal, and steps are being taken 
to improve the position in Orissa. 

The definitions of the different land-use classes depend on 
local customs and usages, and were not, therefore uniform. For 
example, the definiti n of current fallows varied in different states. 
Now the Government of India have increased the number of land- 
use classes and have alSb laid down the uniform definitions for all 
the states in India. 

The methods of yie^d estimation are also not unifpi;m 
thr oug hout the country, as id. some states the Anmwari system 
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% ,s usec * while in Punjab the Method of Direct Estimations 
followed. For some of the crops the Random Sampling Technique 
.is used ip some states. But the attempts are being made to follow 
one and the same common method of estimation in all the states 
of the country. 

(Hi) Defects in tobulotion and processing : 

A good deal of the data collected is rendered useless, as no 
processing is done. For instance in Punjab, Delhi and Madhya 
Pradesh (M. P.) the informations regarding the transfers of 
agricultural properties are collected for each village by the lekhpals; 
but these are not consolidated for the whole state. The similar is 
the case with other states, 

(iv) Defects of the primary reporting agencies : 

The defects in this connection have already been discussed 
both for the temporarily settled areas and the permanently settled 
areas . The comparatively area statistics are more reliable in the 
temporarily setteled areas. Even if the area recorded by lekhpal is 
correct, his classification for different crops may not be correct. 
One of the chief causes for the defects in the reporting of the 
primary agency is the heavy work-load of the lekhp ds. State 
Governments are now taking steps to reduce the load of the 
lekhpals. Sometimes the bias of the primary reporting agency also 
accounts for the defect in the figures. 

(v) Lack of Supervision : The work of the lekhpals is 
inspecten by his immediate officers, Kanungoes and Tehsildars. But 
on account of their heavy administrative duties they are not always 
able to devote the personal attention to such inspection works. This 
indifference induces the lekhpals to be negligent in their duties. 
State Governments are now insisting on better supervision and 
checking of the work of the primary reporting agency. Sometimes 
the villages for inspection are selected by random sampling methods. 
The work of a Kanungo is also being reduced to make their 
supervision and checking better. 

(vi) Lack of Co-ordination : Sometimes the data on the same 
subject are collected by two or more agencies. For example, in 
some states the department of agriculture and that of civil supplies 
obtain the estimates of food production independently of each other. 
The former department is concerned with the increase of yield 
of the crops, while the later with procurement and equitable 
distribution of food. Often, the data collected are different and 
hence there is a great need of co-ordination in such cases, 
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(vii) Pelty ig Publication : The data are first collected in 
the books (registers) of lekhpals and are consolidated first for each , 
village, then for each tehsil, then for each district, then for state as 
a whole ; and each state then sends its consolidated returns to the 
Directorate of Economics & Statistics at the centre for consolidation 
and publication on all India basis. In this system if the delay 
occurs at village or tehsil level, it causes delay in publication ai 
all In,dia level. Hence it is often suggested that closer insistence . 
on punctuality at every stage of this system should be made. 
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EXERCISE No: 3 

Q. No. (l) : Write short notes on the following 

(a) Improvement of agricultural statistics in India. 

(M. Sc. Ag. Agra, 1961) 

(b) Crop-cutting experiments in U. P. 

( M. Sc. Ag. Agra 1962) 

(c) Live stock census; (M. Sc. Ag. Agra, 1962, 1964) 

Q. No. (2) 

(a) Discuss the defects of armawari estimation of crop 
production ? 

(b) In a random crop-cutting survey for estimation of 
yield of wheat crop describe step by step the procedure for locating 
a 33' equilateral triangle for sample cut in a wheat field. The 
yields from 2 such sample cuts are 6 seers 1 ch. and 1 1 seers 
respectively. Express these yields on a per acre basis ? 

{M. Sc. Ag. Agra 1963) 

Q. No. (3) : Write short notes on the following 

(a) Season and crop report of U. P. (M. Sc. Ag. Agra 1963) 

(b) Importance of agricultural statistics for any country, 

(M Sc. Ag. Agra 1965) 

(c) Recent improvements in the sphere of— 

(i) land utilization statistics, 

and (ii) yield and production statistics. 

( M. Sc. Ag. Agra 1965 ) 

(d) Sample cut of 33' equilateral triangle. 

Q. No. (4) : What is ‘ Season and corp * Report of U. P. ? Who 
publishes it, and what is the frequency of its publication ? List 
the,types of information contained in this publication and discuss 
the utility of the same in a planned economy ? 

(M. Sc. Ag. Agra 1964) 

Q. No. (5) : The shape of a wheat field is nearly rectangular 
and the measures of its length and breadth from the south-west 
are 212 steps and 98 steps respectively. Explain clearly with the 
help of a diagram how you would locate a 33' equilateral triangle 
for crop cutting experiment in this field ? ( M . Sc. Ag. Agra 1964) 

Q. No. (6) : Write an essay on agricultural statistics. 
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